Background Transcriptional gene regulation is one of the most important mechanisms in controlling many essential cellular processes, including cell development, cell-cycle control, and the cellular response to variations in environmental conditions. Genes are regulated by transcription factors and other genes/proteins via a complex interconnection network. Such regulatory links may be predicted using microarray expression data, but most regulation models suppose transcription factor independence, which leads to spurious links when many genes have highly correlated expression levels. Results We propose a new algorithm to infer combinatorial control networks from gene-expression data. Based on a simple model of combinatorial gene regulation, it includes a message-passing approach which avoids explicit sampling over putative gene-regulatory networks. This algorithm is shown to recover the structure of a simple artificial cell-cycle network model for baker's yeast. It is then applied to a large-scale yeast gene expression dataset in order to identify combinatorial regulations, and to a data set of direct medical interest, namely the Pleiotropic Drug Resistance (PDR) network.
Background Transcriptional gene regulation is one of the key mechanisms in living cells; the control of gene expression is crucial in processes as cell development, cell-cycle regulation, and response to external stimuli [-]. While the number of sequenced genomes is growing rapidly, it becomes more and more important to study genetic information on a higher level, i.e.
To understand genes in their interdependence and to capture relations between regulatory genes, e.g. Transcription factors (TF) or signaling proteins, and regulated genes via the reconstruction of gene-regulatory networks (GRN). Direct experimental approaches to understand gene regulation are money and time consuming. Therefore genome-scale regulatory networks are only known for E. Coli  and for baker's yeast, S. Cerevisiae [,].
Dec 17, 2012. Quantitative data were compared by a nonparametric Mann Whitney U test. Microarray results were analyzed with GeneSpring 11GX application. Matrix plot of normalized log-intensities visualized the degree of differentiation between muscular tissue transcriptomes of JIS and AIS group.
For higher organisms, the knowledge is restricted to intensively studied small functional modules, see e.g. Some characteristic features of these GRN are: • Directionality: Regulatory control is directed from regulators to regulated genes. • Sparsity: Each single gene is controlled by a limited number of other genes, which is small compared to the total gene content (and also to the total number of TFs) of an organism.
• Combinatorial control: The expression of a gene may depend on the joint activity of various regulatory proteins. The last item is crucial, and it is the topic of very active and diversified research [-]. One example of combinatorial control in yeast is the case of transcription factors Yrr1 and Yrm1, which compete for occupancy of the same promoter sequence .
Many other types of combined control exist, such as the formation of hetero- or homo-dimers by TFs, or their post-translational modification by other proteins, which can entirely change their targets . On the other hand, the hypothesis of sparsity has been experimentally checked in well-studied organisms, where it has been observed that the number of TFs is low compared to the total number of genes. It is tempting to ask in how far GRN can be reconstructed from gene-expression data. After the advent of the first generation of gene-expression microarrays, more than a decade ago , we face an growing number of new high-throughput technologies capable of monitoring simultaneous concentrations of thousands of cellular components, in particular of mRNAs. The improved quality of new generations of microarrays, the decrease of their cost, and the amount of experiments accumulated so far call for the development of large-scale methods of data analysis.
Different approaches to modeling have been proposed (see  for a recent review), from a coarse-grained description of co-regulated genes , classification methods [,], to Boolean descriptions where genes are described in terms of logical switches with only on/off states of activity  (and in particular  for the problem of inference of boolean networks), or considering more realistic systems of differential equations describing the kinetic details . Also for GRN reconstruction, approaches from different origins have been proposed: system control theory [-], Bayesian inference [-], information theory [-]. Many limitations of the existing algorithms arise directly from the quantity and quality of data: Microarrays are noisy averages over cell populations, and the number of available arrays is normally much smaller than the number of probes measured in each array. Moreover, microarrays measure mRNA but not active protein concentrations (which, for TFs, are the important parameters). Both may be uncorrelated in the cell . But as proteomics data are even sparser than microarray data, this is not an easy-to-solve problem, and many modeling approaches use mRNA concentration alone.
Another problem is the existence of combinatorial control in gene regulation: Predicting such cases is a NP-complete problem, and has therefore eluded many approaches due to computational complexity, although some recent and interesting progress has been achieved in . In this paper we introduce a novel algorithmic strategy, based on message-passing techniques, to infer the regulatory network of an organism based solely on genome-wide expression data, that specifically focuses on combinatorial control. Our methodology is probabilistic and distributed, allowing for a fast exploration of the space of networks. We apply the algorithm to three yeast networks: (i) To test the efficiency of the algorithm, we first reconstruct an in-silico regulatory network for cell-cycle control from artificially generated data . (ii) We propose a large-scale reconstruction of the yeast regulatory network, using the classic Gasch microarray dataset , and analyze evidence for combinatorial control. (iii) We use yeast expression data from the SMD database  to recover the regulations affecting genes involved in pleiotropic drug resistance (PDR).
This network is now under intense scrutiny because of the more and more common nosocomial infections by Candida yeasts , which are able to resist to drugs by exporting them out of the cell. These resistance mechanisms are genetically regulated by the PDR network, which we aim to reconstruct. An detailed description of the algorithm is given in the Methods section. An implementation in C can be downloaded at . Reconstructing an in-silico yeast cell-cycle network Before coming to biological data, we test our approach on the network model of Tang et al.  for cell cycle regulation in S.
The cell cycle is regulated by cyclins/CDK complexes, which sequentially activate and inhibit each other, creating a periodicity which is the clock of the cell. Recently sequential waves of transcriptional activation independent of cyclins activation have been discovered [,], but they are not taken into account in the model. It anyway serves as an ideal starting point for the the performance analysis of our analysis, since the data generating network is explicitly known and can be compared to our inferred regulatory interactions. (2) Our aim here is to infer the regulatory links of this network model based on the different state vectors s t. The above in-silico dynamics shows 7 fixed points, i.e. Stationary states of the dynamics. Each fixed point can be characterized by the size of its basin of attraction, i.e.
By the number of initial random initial conditions that end on it. Argue that the fixed point with the largest basin of attraction can be identified with the G1 phase of the cell cycle.
If one perturbs the stationary G1 state by flipping the Cln3 cyclin to its active value, the network passes trough 13 different states before reaching again G 1. The authors of  argue that this trajectory robustly reproduces various aspects of the yeast cell cycle. We test our algorithm on two different data sets: (i) the 13 states obtained by first flipping the Cln3 cyclin to the active value, and letting the system evolve until stationarity as described before, (ii) a larger dataset containing the configurations of data set (i) and additionally the trajectories obtained by evolving all configurations at Hamming distance 1 away from G 1 (70 different states). In Additional File we include both data sets together with the links of the network. In order to deal with time series, Eq. (9) for the prior probability distribution is transformed into, to express the conditional probability of the target gene 0 at time t + 1 given the expression profile of the other genes at time t. For both data sets we fix the diluting field h to a value giving N eff ~ 30 according to Eq.
For the original data set (i) we fix σ D 0. While for the larger data set (ii), convergence of Belief propagation (BP) is ensured by σ D = 0.3. In Fig we display the Precision-Recall curve for the network inferred using BP, for both cell-cycle and perturbed cell-cycle data sets (cf. The paragraph about observables in Methods for a precise definition of precision and recall). Results are compared to the performance of a co-expression network which ranks links j → i according to the Pearson correlation of and. We see that on the original data set BP is able to correctly infer 11 links before making the first error, whereas Pearson correlation fails already after two correctly predicted links.
This result shows that BP correctly manages to take into account combinatorial control effects, which cannot be seen by purely local methods (as pair correlations). Increasing the data set improves the outcome of BP, the larger data set leads to 16 correctly predicted links before the precision drops down from one, and the precision stays always above the one obtained from the 13-state trajectory. It is also interesting to note that the first links inferred by our algorithm are those which where identified in  as essential for reproducing the cell-cycle by a complete enumeration of the space of all networks.
Yeast response to environmental stresses For a second application of BP - at much larger scale - we use the data of Gasch et al. , which consist of 172 genome-wide microarrays of S. Cerevisiae under different environmental conditions.
We filter out all genes, which show little differential expression (variance smaller than three times the minimal variance measured) or which miss more than 10 data points. Thereby the gene number is reduced to 2659 target genes, i.e.
To roughly half of the entire genome. As putative regulators we consider (i) genes annotated as transcription factors or structurally similar to known transcription factors, and (ii) genes involved in signaling : their total number sums up to 460 putative inputs. We run our algorithm with σ = 0.25 which equals the minimal variance of a gene found in the full data set. BP giving probabilistic results, we kept regulatory links with more than 95% of confidence. As the distribution of the marginal probabilities follows a power-law distribution (data not shown), changing this threshold (e.g. Going to 99% or 90%) has little effect on the final network.
The network contains 5779 regulatory links, giving an average of 2.17 links per target; the in-connectivity has a distribution best fitted by an exponential law k = Ce -γ with γ = 0.42, a value very close to the reference one in . Only 182 target genes (7%) have no predicted regulator.
Moreover, 1637 targets (62%) are regulated by at least 2 genes, providing a wealth of potential predictions in the field of combinatorial control. Interestingly enough the finding of 2.17 links per target can be confronted with the result of Balaji et. , based on a review of Chip-chip experiments, reporting a comparable average value of 2.9 regulators per target. Combinatorial control In order to assess the relevance of the inferred network, we compare it first to a network based on pairwise correlations of expression data (co-expression network), which was constructed to have the same number of links as the BP network. Selected links are those of highest absolute value of the Pearson correlation between all input-output gene pairs. This is clearly an oversimplified model, but it allows to grasp the significant features of our model.
One advantage of our algorithm is the explicit inference of combinatorial control mechanisms by multiple transcription factors. Indeed, the number of genes with multiple regulators inferred using our methodology is 1637, while it is only 612 in the case of the pairwise-correlation network.
The average number of regulators per regulated gene (i.e. Genes with at least one inferred regulator) in our BP case is 2.33, and has to be compared to 2.9 from the work of Balaji et al. , and 6.17 for the co-expression network. It is interesting to note that BP results are is closer to the experimental network as compared to the co-expression one.
This feature shows how, for the vast majority of target genes, our algorithm is able to describe the behavior of the gene by combining few putative regulators. Another way of investigating combinatorial control is to compare expression profiles of different regulators. Regulators having highly correlated expression profiles carry similar information to the target gene, whereas regulators having diverse profiles can be used to transmit much more information.
This is directly incorporated in our model: The sparsity term introduced in Eq. 5 reduces the effect of potential regulators whose expression profiles are highly correlated. As a limiting example let us consider two input genes with identical expression profiles, regulating one target gene. The sparsity term will select randomly only one of the two, and identify it as a regulator. In more realistic cases, no two genes shows exactly the same expression, and only the most explanatory gene will be chosen as a regulator out of a set of highly correlated potential TFs. To quantify the independent information carried by each regulator we compute, as a simple measure, one minus the Pearson correlation coefficient between any two regulators of common target genes, see Fig. One can see that the information content is much higher using our methodology than simply co-expression, because the latter tends to discover redundant information as displayed in the example of Fig for the target gene YDR518W.
This specific example also shows that secondary regulators found by BP tend to correct discrepancies between the first regulator and the target gene. Comparison to experimental TF binding data In order to further investigate the significance of the BP inferred network, we compare it to the experimentally verified network presented by Balaji et. , as characterized by 158 TFs, 4411 target genes, and 12974 regulatory links between them. After filtering out genes with low variance in the expression data set, the set of analyzed genes consists of 1919 targets, and 132 TFs. The number of experimentally verified links between these genes reduces consequently to 5533. Again we run BP with σ = 0.25, which equals the minimal variance of a gene found in the full data set, and we keep regulatory links with more than 90% of confidence. The resulting network has 6914 directed edges.
Since these edges describe logical implications between gene expression levels, it is not clear in how far they reflect physical binding between the TF related to the input gene, and the promoter sequence of the target gene. It is easy to imagine that co-regulated genes are discovered as predicting each other, or secondary targets in regulatory cascades are recognized as direct targets. In fact, the overlap with the experimentally verified network is only 206 edges (the resulting network is provided in Additional File ). In order to give a statistical assessment of this number, we compare it to the overlap with a null model: We scramble the links in the BP network randomly preserving the in-degree of the inferred network.
The overlap with the null model is 176 ± 5.3 edges, implying a z-score of 5.5, and a p-value of 1.18 × 10 -8 (under the hypothesis that the distribution of overlaps is Gaussian with mean and variance given by the null-model). To check the effect of an increased number of experiments, we downloaded 1013 microarrays from the Stanford Microarray Database (SMD) .
Now 2614 target genes and 157 regulatory genes pass the statistical test, and the coverage of the experimental network increases to 7635 links. With respect to Gasch's data set, we use a 6-fold higher number of arrays coming from different experiments, so we run BP at a higher noise value σ = 1.5. The resulting BP network has 16176 edges (around three times the number of edges inferred with Gasch dataset alone). The overlap with the experimentally verified network is 406 edges (the resulting network is provided in Additional File ). The overlap with the null-model is 314 ± 7.9 edges. Thus we find a z-score of 11.6, and a p-value of 1.6 × 10 -31.
As a comparison, we decided to analyze the same data set and the same set of 157 potential transcription factor with the ARACNe software . To obtain statistically similar networks we set the data processing inequality threshold (a tunable parameter for controlling the overall number of edges in the network) to 0.10: the resulting network has 19775 directed edges (note that ARACNe produces undirected links). The overlap with the experimentally verified network is of 480 edges (data in Addition Files). The overlap with the null-mode is 424 ± 9.8 edges, with a z-score of 5.7 and a p-value of 3.0 × 10 -9. The sensible increase of statistical significance with respect to the results using Gasch's data is encouraging: It indicates in quantitative form, that larger microarray numbers would allow for extracting substantially more information about regulatory processes from gene expression data.
Inference of the PDR network We finally apply our algorithm to a small dataset, to tackle an issue of direct medical relevance: drug resistance among yeasts. Cerevisiae is able to resist many drugs, using an ensemble of genes connected in the 'pleiotropic drug resistance' network. The basic mechanism is that these genes, regulated by the master regulator PDR1, can export a broad range of substances out of the cell - drugs included. This general feature has been discovered in many organisms, and is considered a generic and robust mechanism of drug resistance, from bacteria to yeasts . The precise regulations acting in this network are yet unknown, even if numerous works have already uncovered a part of them [-]. Here we propose to look for combinatorial regulations in this network, in order to better understand how transcription factors dedicated to drug resistance collaborate to ensure cell survival in harsh conditions - that is, in the presence of drugs. We run our algorithm on 40 genes known to be involved in PDR processes as targets - selection was based on literature -, and use all 157 transcription factors annotated in the database YEASTRACT  as potential regulators.
The expression data consist of 912 microarrays from SMD . Due to its small size, the statistical properties of the inferred network (see Fig ) are quite different from the global one: 265 links were inferred at 95% confidence, giving a high average of 6.65 regulators per regulated gene.
All target genes had at least one regulator; in fact only one had a single regulator (the GIS1 → STB5 couple). The PDR network inferred.
The PDR regulatory network inferred by BP, comprising 157 TF and 40 targets. Targets are shown in grey. Again, as a comparison, we decided to analyze the same data set ARACNe. To obtain statistically similar networks, we set the data processing inequality to 0.10: 247 links were inferred (note that ARACNe produces undirected links). Both networks are provided in Additional File.
As a first observation we note that 13 out of the 40 target gene appear not regulated in the ARACNe network. We can conclude that, at least in this case, ARACNe seems to produce links which are more concentrated to a smaller target number, with an in-degree of 9.14 ± 6.6 TF/regulated target (to be confronted with the BP results of 6.625 ± 3.6). Compared to the latest version of YEASTRACT, we find the following numbers of overlapping links: 16 in our case (if we consider the TF → target direction), and 28 if the direction is not taken into account. ARACNe, which produces an undirected network, has only 22 overlapping links. We also compared our findings with the network presented in the work of Balaji et al. : in the BP case we match 8 directed edges and 15 undirected ones, whereas ARACNe matches 9 undirected links. Moreover, a closer look to some predicted cases of combinatorial control gives interesting insights into the biology of drug resistance.
In particular, we find RPN4, a transcriptional regulator of the proteasome, regulated by both PDR3 and YAP1. This interaction between drug resistance and the proteasome was already hinted in previous works concerning global stress resistance , and was recently proved experimentally . This case is not found when running ARACNe on the same dataset, emphasizing the need for specially designed algorithms in order to uncover new cases of combinatorial control. Another interesting case of combinatorial regulation predicted in this analysis is the cross regulation of YAP1 and RAS1 by PDR1, PDR3 and RPN4. This complex regulation could therefore link drug resistance and proteasome regulation to the processes of cell aging and proliferation, regulated by RAS1.
However, to our best knowledge there is no experimental evidence of this link, which is to be confirmed. Conclusions In this work, we have presented an effcient method for genome-wide inference of regulatory networks, particularly designed to take into account cases of genetic combinatorial control. The method, based on message passing, was tested on a small in-silico model for the cell-cycle regulation in yeast, and then applied to both a large-scale and a small-scale dataset.
The test shows the accuracy of the method in case of informative data, and the applications predict meaningful network structures. One relevant feature of our algorithm is its capability of unveiling patterns of combinatorial control. Even if the model of gene-regulation we used (linear superposition of inputs, followed by a non-linear function) is very simple, it allows for regulators which account only for part of the target expression, and which may be corrected for by other regulators under other conditions, cf. From the algorithmic point of view, our methodology allows to explore combinatorially the full space of regulatory networks while keeping the computational time short.
The flexibility of the approach allows for integrating other type of data: to give an example, information about putative transcription factor binding sites in the regulatory region of an output gene can be easily integrated via a transcription-factor dependent diluting field h. Finally, our method can be generalized to tackle a variety of issues in the field of gene regulation inference. One possibility is to try to discover new regulators, by a corrective methodology, starting with a known regulatory network and looking for the most relevant regulations to be added to this network. Another possibility is to use the information of combinatorial control in conjunction with the nature of the expression data to explain which conditions allow which combinatorial controls, opening the door to a wealth of genetic experiments and to a better understanding of the complexity of gene regulation.
Data encoding Gene expression data are encoded into a ( N + 1) × M input matrix of entries, with i = 0.1., N and μ = 1., M, where M is the number of experiments (arrays), N + 1 is the number of genes. The value is a real number that quantifies the level of expression of gene i in sample μ; more precisely, is the i log-ratio of the actual expression of the gene i and the expression of the same gene in a reference condition. A negative (positive) value indicates the under- (over-)expression of gene i i sample μ with respect to the reference. Here we use the vectorial notation to indicate expression pattern μ. The task is the reconstruction of a network model which may explain these data. Using a statistical-physics analogy, starting from some snapshots of the microscopic state of a system one tries to infer the energy function (Hamiltonian) governing its behavior. Note that due to the directed nature of gene networks this task can be formally factorized over regulated genes: we can ask first, which genes have a regulatory influence on gene 0, and how they interact combinatorially.
Then we ask the same question for the regulators of gene 1, 2., N. To further simplify the possible influence other genes can have on target gene 0, we aim at a ternary classification of the influence of a gene i on 0. This classification scheme is clearly an oversimplification with respect to biological reality, where a whole range of positive and negative interaction strengths is expected. On the other hand, given the peculiar restriction posed by the limited number of available expression patterns, having a simple but meaningful model reduces the risk of overfitting and produces results which are easier to interpret.
Our algorithm can be easily extended to include more than three values for the J i→0; in most cases we have analyzed this generalization does not increase the predictive power. Belief Propagation The belief propagation (BP) algorithm is exact on tree-like graphical models, but it has been extensively used as an heuristic procedure to solve problems defined on sparse graphs [,]. Recently, the same approach has been shown to be a good approximation also for problems with dense graph structure [-]. BP is an iterative algorithm for estimating marginal probability distributions. It works by locally exchanging messages, until global consistence is achieved.
The messages sent between variable nodes i (couplings) and function nodes μ (constraints) are: • The probability ρ μ→ i( J i→0) that constraint μ forces variable i to assume value J i → 0. • The probability P i→ μ( J i→0) that variable i takes value J i→0 in the absence of constraint μ. Computational complexity By means of the Gaussian approximation, the complexity of Eq. (12) is reduced from (3 N) to ( N), and that of the overall iteration to ( MN). The apparent complexity ( MN 2) of updating M N messages in time ( N) can be reduced to ( MN). By a simple trick: The sums in Eqs.
(16) can be calculated over all j once for each μ, so only the contribution of i has to be removed in the update of ρ μ→ i for each i. This allows to make the single update step in constant time. A precise estimate of the overall complexity of the algorithm would require to control the scaling of the number of iterations needed for convergence. A theoretical analysis of BP convergence times in a general setting remains elusive. Some recent progress for the simpler matching problem can be found in . In all the simulations presented in this work, convergence is always reached in less then 50 iterations. It would be interesting to compare the efficiency of our algorithm with the computational strategy proposed in , based on a Monte Carlo Markov Chain (MCMC) sampler over the model space.
In our experience, however, MCMC methods have in general some intrinsically associated problems, mainly due to the fact that the convergence (or mixing) time is hard to assess and often is exponential. Observables Marginals - We do not aim at constructing a single high-scoring coupling vector J like in a max-likelihood approach.
Depending on the shape of the probability space, this vector might be very different from the one actually generating the data. We are instead interested in characterizing the ensemble of all high-scoring vectors, or more precisely in the marginal probabilities, which tell us how frequently the coupling from i to 0 takes value J i→0. We can therefore base a global ranking of all potential couplings i → 0 on the probabilities 1 - P i( J i→0 = 0) of being non-zero. (17) The objective of inference is predicting a fraction of all couplings with high precision, i.e. To have an as high as possible number of TP with a low number of FP.
The quality of the inference can be accounted for by confronting recall (or sensitivity) RC = N TP /( N TP + N FN) and precision (or specificity) PR = N TP /( N TP + N FP). The recall describes the fraction of all existing non-zero couplings which are recovered by the algorithm, whereas the precision tells us the fraction of all predicted links being actually present in the data generator. Parameter fixing and zero-entropy criterion The diluting field h is the conjugate variable of the number of effective link, so we can equivalently fix one of the two quantities. One can decide to fix the number of effective links, and thus the size of the searched gene signature, and to choose h accordingly. To find the correct value of h we apply a cooling procedure where, after each interaction of the BP equations step, we increase (resp.
Decrease) h depending on whether the effective number of link is higher (resp. Lower) than the desired value. Since the true number of relevant genes is an unknown quantity, the chosen value for, itself is a free parameter. In practice, in the cooling procedure of the h field, we monitor the value of the entropy and we stop the iteration when as soon as it becomes lower then zero, i.e.
At the point where we are able to restrict the of the number of possible solution to our problem to a sub-exponential number (remember that the entropy here indicates the logarithm of the number of solutions). Upon a further increase of h the entropy becomes negative, and no zero energy solution is found at that value of the dilution parameter h. In all our simulations we have taken the limit β → ∞.
The incidence of fungal infections in immuno-compromised patients increased considerably over the last 30 years. New treatments are therefore needed against pathogenic fungi.
With Candida albicans as a model, study of host-fungal pathogen interactions might reveal new sources of therapies. Transcription factors (TF) are of interest since they integrate signals from the host environment and participate in an adapted microbial response. TFs of the Zn2-Cys6 class are specific to fungi and are important regulators of fungal metabolism. This work analyzed the importance of the C. Albicans Zn2-Cys6 TF for mice kidney colonization. For this purpose, 77 Zn2-Cys6 TF mutants were screened in a systemic mice model of infection by pools of 10 mutants. We developed a simple barcoding strategy to specifically detect each mutant DNA from mice kidney by quantitative PCR.
Among the 77 TF mutant strains tested, eight showed a decreased colonization including mutants for orf19.3405, orf19.255, orf19.5133, RGT1, UGA3, orf19.6182, SEF1 and orf19.2646, and four an increased colonization including mutants for orf19.4166, ZFU2, orf19.1685 and UPC2 as compared to the isogenic wild type strain. Our approach was validated by comparable results obtained with the same animal model using a single mutant and the revertant for an ORF (orf19.2646) with still unknown functions. In an attempt to identify putative involvement of such TFs in already known C. Albicans virulence mechanisms, we determined their in vitro susceptibility to pH, heat and oxidative stresses, as well as ability to produce hyphae and invade agar. A poor correlation was found between in vitro and in vivo assays, thus suggesting that TFs needed for mice kidney colonization may involve still unknown mechanisms. This large-scale analysis of mice organ colonization by C. Albicans can now be extended to other mutant libraries since our in vivo screening strategy can be adapted to any preexisting mutants.
Introduction Candida albicans is an ubiquitous dimorphic organism colonizing skin and mucosa of immuno-competent people without causing any pathologies. In contrast, it is the cause of a wide spectrum of diseases in immuno-compromized patients, ranging from benign mucosal infections such as oral thrush to disseminated candidiasis, which can be fatal. Oral and vaginal infections with C. Albicans are extremely common even in weakly immuno-compromized individuals. In severe cases of immunodeficiency, C. Albicans penetrates into deeper tissues and may enter the bloodstream. From the bloodstream, the fungus has the potential to invade almost all body sites and organs.
Therefore, C. Albicans is able to survive in radically different environments with dramatic changes in physico-chemical conditions such as oxygen and carbon dioxide tension, pH and temperature.
In addition, it has to escape host immune defenses, even if they are weakened in immuno-compromized patients,. The therapy of C. Albicans infections necessitates the use of antifungal agents acting essentially against ergosterol, DNA and cell wall biosynthesis as described in details in several reviews,,,. The exposure of fungal pathogens to antifungal agents can engage mechanisms enabling their adaptation and finally resulting in drug resistance that is associated with treatment failure.
Albicans resistance to antifungal agents is often the result of genetic alterations such as gain-of-function mutations or chromosomal rearrangements (for review see ), or of the formation of multicellular associations known as biofilms,. In the case of C. Albicans systemic infections, treatment failures are relatively frequent even in absence of in vitro measurable antifungal resistance in specific fungal strains, leading to a mortality rate of up to 30%. Therefore, it seems unlikely that existing antifungal agents will have an impact on the mortality rate of systemic fungal infections. A better understanding of host-pathogen interactions is necessary to develop either new drugs and/or new clinical approaches. These new therapeutic approaches may be combined with existing antifungal chemotherapies and immunotherapies.
Several studies have addressed this question in the past but focusing only on the host immune response involving chemokines, cytokines and effector cells,,,. These studies often included analyses on a restricted number of genes or explored transcriptome analysis essentially by ex vivo experiments,,,,, and one by in vivo analysis. All these analyses yielded a limited perspective on the interactions between C. Albicans and its host during the infectious process. Large scale reverse genetic analyses of C. Albicans genes involved in environmental adaptation have been performed in vitro, and thus are not appropriate for the study on host-pathogen interactions.
Very recently two studies performed in vivo large scale analyses of C. Albicans genes involved in the infectious process,. Nevertheless these analyses did not focus on a specific class of genes and were not exhaustive or investigated heterozygous mutants of essential genes but in which the effect of the deletion of only one gene copy could not be visible.
Their implementation was time- and cost-consuming and could not be adapted to a pre-existing C. Albicans mutant library. In this study we propose a simple way to screen directly for host colonization a collection of C. Albicans Zn2-Cys6 transcription factor (TF) mutants in a murine disseminated infection model. TFs integrate several signals originating from the environment and mediate adapted responses by modulating gene transcription.
Blocking a TF involved in the response to a specific host condition may disarm the adapted fungus response that otherwise allows its survival. Previous contributions confirm this hypothesis and demonstrate that in C. Albicans, deletion of genes encoding for TFs result in significant decrease of virulence. These TFs are involved in (i) adaptation to oxidative stress ( CAP1 and RIM101),,,, (ii) nitrogen regulation ( GAT1), and (iii) yeast to hyphae transition ( CPH1, EFG1, TUP1, and CaNDT80),,,,,. Thus, TFs represent interesting candidates to study virulence in order to design new antifungal strategies. Among existing C.
Albicans TFs, the Zn2-Cys6 “zinc cluster” subclass is of particular interest since it is fungal-specific. In order to address the involvement of TF of this family in host colonization, we developed a system in which C. Albicans TF mutants are analysed in pools of 10 tagged strains. Albicans DNA extraction from mice organs, quantitative PCR (qPCR) is performed on specific tags and thus allows the relative quantification of individual mutant in the population in infected tissues. We tested 77 mutants of specific Zn2-Cys6 TF for their potential in colonizing mice kidneys in a mouse model of systemic infection. We were able to distinguish 8 and 4 mutants with reduced and increased colonization, respectively, as compared to an isogenic wild type strain.
In an attempt to identify the putative function of these TFs and to better elucidate their involvement in virulence and/or colonization, we determined in vitro susceptibilities to heat, pH and oxidative stresses as well as hyphae formation and agar invasion of these 77 strains. These phenotypic tests revealed an overall poor correlation with in vivo results, suggesting that most of these TFs may regulate the expression of yet unidentified virulence factors. Strains and media The C. Albicans strains used in this study are listed in. These mutants are part of a larger collection of transcription factor mutants available at the Fungal Genetic Stock Center (). They were selected for the presence of a fungal Zn2-Cys6 binuclear cluster DNA-binding domain (Pfam accession number PF00172, ).
The mutants were constructed by four different gene inactivation strategies (see ), using two different parent strains, CAF4-2 and BWP17, which were both used as control strains throughout this study. Each TF mutant was transformed with a plasmid containing a barcode and was then renamed BCYi for “BarCoded Yeast” number i (see and ). Isolates were grown in complete medium YEPD (1% Bacto peptone, Difco Laboratories, Basel, Switzerland), 0.5% Yeast extract (Difco) and 2% glucose (Fluka, Buchs, Switzerland) or in minimal medium YNB ( Yeast Nitrogen Base) (Difco) and 2% glucose (Fluka). When grown on solid media, 2% agar (Difco) was added. Escherichia coli DH5α was used as a host for plasmid constructions and propagation.
DH5α was grown in LB (Luria-Bertani broth) or LB plates, supplemented with ampicillin (0.1 mg/ml) when required. Phenotypic tests Eight different in vitro phenotypes, putatively relevant for virulence and/or colonization in the host, were assessed for each of our mutants by comparison with their wild type parental strains.
These phenotypic tests were performed by serial dilutions of fungal cultures onto solid agar YEPD-based plates. Yeast cultures were grown overnight in liquid YEPD and diluted to a density of 1.5×10 5 cells/ml. Two serial 10-fold dilutions were performed to a final dilution containing 1.5×10 3 cells/ml. Four microliters of each dilution were spotted onto YEPD-based plates and incubated for 24 to 72 h depending on the condition tested. Following this large scale screening, a refined screen was performed for mutants with a phenotype differing from the wild type. For that purpose, the same conditions were used except that six serially 5-fold dilutions were spotted on agar plates. First, heat susceptibility was tested by incubation at 42°C as compared to 35°C which corresponds to the reference condition.
We also determined the susceptibility of our mutants to alkaline pH (pH 8.3 was achieved by adding 50 mM of HEPES pH 8.5 and 25 mM NaOH) or acidic pH (pH 3.35 was achieved by adding 50 mM of HEPES pH 1.9). Third, we assessed the susceptibility to 1 mM and 5 mM H 2O 2. Lastly, filamentation phenotype was determined by, either plating each mutant onto YEPD agar plates supplemented with 10% of fetal calf serum (FCS), growing cells in 10% FCS liquid YEPD medium for 4 h, and determining agar invasion capacity after 48 h of growth in YEPD agar plates. All plates were recorded for phenotypes after 24 and 48 h of incubation except for FCS-supplemented plates, which were analyzed after 72 h of incubation. For cells grown in liquid medium, pictures were taken using an Axiovert200 Zeiss inverted microscope with a 200× magnification. This screening was repeated twice from independent overnight cultures to assess the reproducibility of the results.
Construction of plasmids Tagged-plasmids were constructed from CIp30 and were carrying STM tags consisting in random 40-nucleotides sequences used in C. We chose 10 distinct STM tags with numbers 6, 11, 20, 43, 209, 219, 224, 227, 232 and 240 (see for nucleotide sequences). First, complementary oligomers () containing the 40-mer sequence of the STM tag flanked by the cohesive ends with NotI and DraIII restriction sites were hybridized and digested by DraIII and NotI. Then, the double-stranded oligomers were cloned into pKS(+) (Stratagene) at DraIII and NotI sites and re-cloned into CIp30 at ScaI and NotI sites. This cloning step was necessary since we discovered after sequence analysis of CIp30 near the NotI site that the published CIp30 map was not designed from a pKS(+) backbone but from a pKS(−) backbone.
Therefore we had to subclone the double-stranded oligomers in a pKS(+) backbone to avoid the problem of the non-palindromic DraIII site. Finally 10 plasmids were generated from the CIp30 backbone containing each a specific tag and were called CIp30-STM6, −11, −20, −43, −209, −219, −224, −227, −232 and −240. Sequences of oligonucleotides used in this study. For the construction of the orf19.2646 revertant strains, pAC249 was designed to re-introduce a wild-type orf19.2646 allele at its genomic locus.
For this purpose, the SAT1 gene was amplified from pSFS2 using the primers Sat1-Not ( 5′-ATAAGAATGCGGCCGCGTCAAAACTAGAGAATAATAAAG-3′) and Sat1-BamHI ( 5′-GCAAAGGATCCCACCACCTTTGATTGTAAAT-3′). This fragment was introduced into pBluescript KS+ by BamHI and NotI to yield pDS1551. Next, orf19.2646 was amplified using the primers orf19.2646-5-for ( 5′-CGCGAGGTACCTCAATCAAGCCTCCTGTACC-3′) and orf19.2646-Rxho ( 5′-CGCGACTCGAGTGTACACAAAACTTAGAACC-3′). The resulting PCR fragment was cloned into pDS1551 by KpnI and XhoI to yield pAC249.
Preparation of C. Albicans DNA from mice kidneys Frozen pellets of kidneys homogenates were thawed slowly at room temperature, resuspended in 2 ml of water and then dispatched equally in 2 ml screw caps tubes. Tubes were centrifuged for 1 min at maximum speed. One pellet was kept for further analyses. The other pellets were washed three times in lysis buffer (2% Triton X-100, 1% SDS, 100 mM NaCl, 10 mM Tris HCl pH 8, 1 mM EDTA) in order to eliminate mice tissue debris. Yeast cells were resuspended in 200 µl of lysis buffer supplemented with one volume of phenol-chloroform. Cells were next disrupted by adding one volume of glass beads and by agitating the solution for 15 seconds at a power of 5 in a Fastprep 24 instrument (MP Biomedical, France).
Suspensions were then centrifuged for 10 min at maximum speed. The aqueous phase was recovered and DNA was precipitated with 2 volumes of absolute ethanol. Pellet was resuspended in 50 µl of TE (10 mM Tris HCl pH 7.5, 1 mM EDTA) in which 10 µl of RNase A (Roche) (10 mg/ml) was added. Solution was incubated for 30 min at 37°C. Ten µl of proteinase K (Sigma) (10 mg/ml) were added and solution was incubated for 30 min at 65°C. DNA was then precipitated and resuspended in TE. After overnight incubation at 4°C, the concentration of DNA was measured using a Nanodrop ND1000 equipment (ThermoScientific, DE, USA).
Quantitative real-time PCR (qPCR) and normalization qPCR reactions were performed with 0.2 µM of each primer, 0.1 µM of probe and iTaq supermix with ROX (BioRad, Reinach, Switzerland) according to the manufacturer's instructions. Cycling conditions were as following: 2 min at 50°C, 3 min at 95°C followed by 40 cycles of 15 s at 95°C and 1 min at 60°C.
Thermo Scientific Grams Download Free. Amplification and detection of PCR products were performed with a Step One Plus™ (Applied Biosystems, CA, USA). Reactions were performed in a total volume of 25 µl.
Data were analysed using the Step One software V2.1 (Applied Biosystems, CA, USA). In each run, a PCR control was performed with yeast DNA extract from kidneys of non-infected mice. The signals obtained gave the noise level of the qPCR. Each sample showing a C T for STM6 (wild type strain) higher than that of the non-infected mice reaction was rejected since C. Albicans DNA extraction from kidneys probably failed. Such samples were eliminated for further analysis.
Standard curves were created with 10-fold serial dilutions (10 5 to 10 1 copies) of the ten CIp30-STM plasmids. These ten standard curves (one for each barcode) were used to determine the copy number (Qx) of the different barcodes carried by strains of each pool from infected mice kidney (Qx INVIVO) or from 24h YEPD cultures (Qx INVITRO). To determine an increase or decrease of growth of strains in vivo as compared to in vitro conditions, the “ in vivo/in vitro” ratio (dQx = Qx INVIVO/Qx INVITRO) was calculated for each STM barcoded strain. This dQx ratio calculation allows the normalization of the in vivo data to the in vitro growth of each mutant strain. Finally, to obtain the colonization score of each strain (S = log2 (dQx/dQ STM6)), each dQx ratio was normalized to the dQ STM6 of its pool in order to normalize the results of all pools. The STM6 barcode was contained in isogenic wild type strains present in all the pools as positive control.
Results In order to discover new C. Albicans virulence factors and to help elucidating novel elements in host-pathogen interactions, we propose here to screen a collection of C. Albicans transcription factor mutants in a murine disseminated infection model. The murine disseminated infection model is a conventional experimental model to study C. Albicans pathogenicity. In this model, several organs such as brain, spleen, lungs, liver and kidneys are infected by C.
We planned to use this model of infection to evaluate organ colonization by Zn2-Cys6 TF mutant strains. For this purpose, we used mutants from an already existing collection containing 239 mutants corresponding to almost all the mutants of non-essential TF genes of C. To reduce the number of animals used in this study, the 77 Zn2-Cys6 TF mutant strains were tested by pools as described in and. We developed a strategy to tag strains of a pool with 40-mers barcodes enabling their specific and individual detection (see and ). This original strategy of screening limits the number of tags, primers and probes to detect each strain in a pool and thus reduces considerably the costs of screening. In vitro detection of barcoded C.
Albicans strains A pool of infection contained ten strains: eight strains of the TF mutant collection, a wild type strain as a positive control and an avirulent strain (a cmp1Δ mutant) as a negative control. Each strain of the pool was tagged with a barcode consisting of a unique 40 oligonucleotides sequence () selected among STM (Sequence Tag Mutagenesis) tags previously used in C. Each barcode was cloned in a C. Albicans optimized plasmid (CIp30) and transformed in TF mutant strains as indicated in. Ten sets of Taqman probes and primers were designed as presented in for STM6 in order to specifically detect each barcode.
Our first attempt was to specifically detect each barcode in vitro. For this purpose, the ten STM-CIp30 plasmids were introduced into the wild type strain BWP17.
We thus obtained ten strains differing only by their barcodes. Using DNA from in vitro culture of each strain and from co-culture of all strains, we were able to detect each strain specifically (data not shown). Our second attempt was to determine the limit of detection of a strain within a pool. The DNAs of the ten BWP17-tagged strains, except the one carrying the STM11-tag, were mixed in an equal amount. The DNA of the STM11-tagged strain was diluted to a concentration of 10 ng/µl and 10-fold serially diluted to reach a concentration of 0.01 ng/µl. Each dilution was mixed with 10 ng of pool DNA. STM11 was detected specifically in the different samples ().
BWP17-STM11 DNA can be detected even if 500-fold diluted in the pool as compared to other DNAs (). Detection of C. Albicans barcoded strains in vivo The subsequent test was to assess the detection of each barcode in vivo.
We first determined the threshold above or under which a strain may be considered as colonizing mice organs differently than the wild type. For this purpose a “BWP17 in vivo pilot assay” was performed with a pool containing ten BWP17-tagged strains as described above. This in vivo pilot assay was performed three times with groups of three mice. At day three post-infection, which is the minimal time for C.
Albicans required to complete a systemic infection, mice kidneys were recovered. Albicans DNA extraction from mice kidneys was optimized to reduce contamination by mammalian DNA. In parallel, C. Albicans DNA was also extracted from a 24 h in vitro co-culture in 50 ml YEPD of the same inoculum (250 µl). The relative abundances of each DNA were quantified by qPCR. The detection of each barcode from mice kidneys was successful.
The score of infection of each BWP17-tagged strain was calculated relative to that of the BWP17-STM6 strain as described in. When the results of the three independent experiments were pooled, the medium score has a value of −0.05 (± 0.72) (). This value, to which ± two standard deviations were added (1.39 to −1.49), corresponds to the threshold above or under which a strain has a different colonization potential as compared to the wild type. Each pool contains a cmp1Δ strain as a negative control of infection. CMP1 encodes for the catalytic subunit of calcineurin.
Previous studies extensively described that strains lacking a functional calcineurin are avirulent and unable to colonize mice organs in the model of disseminated infection,,. To validate our screening system, we verified whether or not such a strain could be detected from infected mice tissues. For this purpose, we performed a so called “ CMP1“ pilot experiment, in which a pool containing a BWP17-STM6 tagged strain and nine cmp1Δ strains carrying the other barcodes was injected in groups of three mice. The cmp1Δ mutants (all tags combined) showed a score of −4.02 (± 2.12) (data not shown), thus confirming that the system was able to detect a strain with a colonization score lower than the wild type. These in vivo pilot assays demonstrated that our strategy of screening was functional in vivo and also defined the limits of detection and interpretation of the results.
Analysis of ten pools of Zn2-Cys6 TF mutant strains The screening of the 77 C. Albicans Zn2-Cys6 mutant strains in a mouse model of systemic infection was performed by designing ten pools of infection as detailed in. These 77 mutants correspond to only 74 mutated genes since three genes were mutated twice independently ( i.e. Orf19.5133, orf19.3012 and orf19.
All mutant strains were transformed with plasmids derived from STMx-CIp30. These plasmids carrying the STM-tags allow also to re-introduce URA3, HIS1 or ARG4 markers in a neutral locus for virulence. Pool 1 was containing nine strains with a CAF4-2 background ( and ). The seven Zn2-Cys6 TFs and the cmp1Δ mutants of the pool were constructed using URA blaster cassettes in the CAF4-2 background,,. Therefore, CAF4-2 containing STM6-CIp30 was the positive control.
The negative control was DSY2101 ( cmp1Δ) containing STM240-CIp30 (). The nine other pools contained isolates with a BWP17 background. Each pool except pools 2 and 6 contained eight Zn2-Cys6 TF mutants (only seven for pools 2 and 6), BWP17 tagged with STM6 and DSY4343 ( cmp1Δ) tagged with STM240. Mutants of these pools were obtained using different strategies as detailed in.
Our first analysis was performed on cmp1Δ strains. When merging the scores of BCY82 ( cmp1Δ in the BWP17 background) from pools 2 to 10 and from the “ CMP1 pool” as detailed above, we obtained an average score of −3.83 (± 2.42).
It is important to note that in some experiments, no qPCR signals could be obtained for BCY82, probably because the DNA of this strain was under the limit of detection. This means that the DNA of this strain was at least 500-fold more diluted as compared to other DNAs of the pool that included mice DNA from kidneys. BCY13 ( cmp1Δ mutant in a CAF4-2 background) showed a score of -4.15 (± 2.44) ( and Supplementary File S1). Both scores obtained for the cmp1Δ strains (BCY82 and BCY13) were under the threshold value of −1.49 as mentioned above, thus confirming that these strains are less colonizing mice tissues than the wild type independently of the genetic background. After setting the detection limits of our in vivo screening system, we determined the colonization scores of all strains of the Zn2-Cys6 TF mutant collection in comparison to their isogenic wild type strains. The scores of all strains are presented in and detailed in. The majority of the strains have a score within the interval delimited by the “BWP17 pool” assay, indicating that their colonization scores were not significantly different from the wild type strains.
In contrast, four strains (as referred to hyper-colonizers), i.e. BCY27, BCY150, BCY160 and BCY162 showed scores of 2.36, 3.23, 2.42 and 3.18, respectively, and were above the “BWP17 pool” limit. This suggests a higher colonization capacity than the wild type. Next, eight strains (as referred to hypo-colonizers), i.e. BCY36, BCY21, BCY164, BCY88, BCY112, BCY122, BCY148 and BCY152 showed scores below the “BWP17 pool” limit (−3.2, −2.57, −2.34, −1.65, −3.49, −2.08, −2.11, and −2.08, respectively).
These data suggest that these strains are poor colonizers as compared to the wild type. Strain scores of mice kidney colonization. The four hyper-colonizers strains correspond to mutations of orf19.4166, orf19.6781, orf19.1685 and orf19.391. The orf19.4166 ( ZCF21), orf19.6781 ( ZFU2) and orf19.1685 ( ZCF7) encode TFs with yet unknown functions. Interestingly, orf19.391 (also called UPC2) encodes a TF regulating genes of the ERG family involved in sterol biosynthesis and therefore plasma membrane integrity. Upc2 was also reported to be involved in antifungal resistance due to acquisition of gain-of-function mutations,,,. The eight hypo-colonizer strains correspond to mutations of orf19.3405 ( ZCF18), orf19.255 ( ZCF1), orf19.5133 ( ZCF29), orf19.2747 ( RGT1), orf19.7570 ( UGA3), orf19.6182 ( ZCF34), orf19.3753 ( SEF1) and orf19.2646 ( ZCF13).
Interestingly, orf19.5133 encodes a TF with yet unknown function and is deleted and interrupted in strains BCY164 (pool 3) and BCY158 (pool 10), respectively. BCY164 (orf19.5133) contained the tag STM209 and showed a score of −2.34 (± 1.13). BCY158, tagged with STM224, did not appear as hypo-colonizer since no signal above the background was obtained by qPCR for this strain. Five other genes including orf19.3405, orf19.255, orf19.6182, UGA3 and orf19.2646 encode Zn2-Cys6 TF with yet unknown function.
These TFs have been analysed in large scale screening of mutants or found as target genes in transcriptional microarray analyses,,,. RGT1 encodes a transcriptional repressor involved in the regulation of glucose transporter genes. SEF1 encodes a TF involved in iron assimilation and survival in stationary phase at 30°C,. Single strain infections In order to validate our previous results and in order to eliminate false positives due to pool effect in the infections, we performed single strain infection with seven hypo-colonizers and BCY31 (BWP17 tagged with STM6). As in the case of pools experiments, groups of 3 mice were infected intravenously with 5×10 5 cells of C.
Albicans mutant strains. Kidney colonization was next analysed by counting C. Albicans CFUs in kidneys at three days post-infection as performed in the pool experiments. Results are presented in. We observed that fungal burdens of all mutant strains were lower than the wild type. Nevertheless, only mutants for orf19.3405 and orf19.2646 showed a significant statistical difference in tissue colonization as compared to BCY31 (8.11×10 3 and 1.09×10 4 CFU/g kidney for mutants and 1.24×10 7 for BCY31). Therefore, these two mutants constitute interesting candidates for further analyses.
In vitro phenotypes of Zn2-Cys6 TF mutants In an attempt to correlate in vivo results with putative functions of identified TFs, the whole collection was screened for six phenotypes potentially relevant for in vivo growth by comparing each mutant with its parent wild type strain. First, we determined resistance to heat and oxidative stress by incubation at 42°C and in H 2O 2-supplemented medium, respectively, and growth capacities at alkaline or acidic pH of each mutant on rich media. Considering the environmental conditions that C. Albicans faces within the host, an enhanced or reduced resistance to heat-, oxidative- or pH-stresses may give indications regarding the involvement of a given Zn2-Cys6 TF in virulence and/or colonization abilities. Since switching from the yeast to hyphal form is known to be a determinant of virulence, the ability of each mutant to produce true hyphae and/or pseudohyphae was also assessed. For that purpose, colony wrinkling was visually inspected after 72 h of incubation on rich media supplemented with 10% serum, which is known to induce filamentation in C.
Moreover, agar invasion was quantified by washing YEPD agar plates after 48 h of incubation at 35°C and by visually evaluating a decrease or an increase of invasive growth as compared to the wild type strain. Raw pictures of this screening are provided as supplementary data (see,,,,, ).
Lastly, pictures of cells grown for 4 h in YEPD liquid media supplemented with 10% FCS were analyzed for the presence of true hyphae. Overall, 39 out of 74 mutants displayed an altered phenotype regarding the six criteria tested here (, and, sheet “phenotype”). Only a few mutants displayed altered susceptibilities to heat, pH and H 2O 2. Indeed, none of the mutants showed impaired growth at acidic pH and only one of them, SEF1, exhibited decreased growth at alkaline pH. Likewise, only one mutant, orf19.3188 ( TAC1) was hypersusceptible to H 2O 2. Two mutants had a decreased growth capacity at 42°C, namely orf19.5849 ( CWT1), and orf19.2646 ( ZCF13). The largest phenotypic alterations were observed in morphology-discriminative conditions since among the 39 mutants displaying a phenotype in our screen, 28 were selected for their abnormal morphology on solid YEPD 10% FCS media and/or their altered ability to invade agar ( and, sheet “phenotype”).
Out of 22 mutants with abnormal morphology, 13 produced less wrinkled colonies as compared to their respective wild-type parents and 9 an increased colony wrinkling. Regarding agar invasion ability, 6 and 5 mutants showed increased and decreased invasion in the solid substrate, respectively, as compared to the wild-type ( and ). ORFs deleted in the Zn2Cys6 TF mutants with altered phenotypes on solid media as compared to wild type strain. Liquid YEPD 10% FCS media cultures allowed the selection of 21 mutants with a modified cell morphology as compared to their parental wild type strains. Mutants with more than 80% of cells forming hyphae were scored as positive. In contrast, mutants showing less than 10% of hyphae were scored as negative (see and, sheet “Phenotype”). Following these criteria, 11 mutants displayed almost no hyphae formation.
In contrast, 10 displayed an increase in true hyphae formation. In vivo single infection experiments allowed the selection of two interesting TF (orf19.2646 and orf19.3405) mutant strains for their hypo-colonizer phenotype in kidneys. Mutant for orf19.2646 produced highly wrinkled colonies on YEPD agar medium supplemented with 10% FCS and also better invaded the agar as compared to the wild-type.
Mutant for orf19.3405 showed a growth deficiency in all conditions tested () but displayed an increased production of true hyphae in liquid YEPD supplemented with 10% FCS ( and ). Reversion of the orf19.2646 deletion phenotypes We further investigated orf19.2646 since it appeared as a promising candidate to discover yet unknown virulence factors. To confirm that both in vitro and in vivo phenotypes observed were effectively due to the interruption of orf19.2646, we constructed a revertant strain by re-introduction of a wild-type gene at the genomic locus in the BCY152 background (see ). As shown in, the revertant strain exhibited as the parent wild-type no growth deficiency at 42°C and no longer displayed a wrinkled colony morphology in contrast to the mutant strain.
Likewise, animal experiments confirmed that the re-introduction of orf19.2646 restored a colonization capacity comparable to the wild-type strain which was also statistically different from the mutant (). Discussion In this work, we screened at large scale the role of C. Albicans Zn2-Cys6 TFs in organs colonization using a mouse model of systemic infection. For this purpose, we developed a screening strategy infecting mice with pools of ten strains tagged with a set of ten barcodes consisting of 40-nucleotides sequences.
Among the 77 Zn2-Cys6 TF mutants tested corresponding to 74 mutated ORF, four resulted in a hyper-colonizer phenotype while eight showed hypo-colonizer phenotype (see ). Recently, similar large-scale in vivo mutant screening were performed by Noble et al.
And Chamilos et al.. This last work was performed in the Toll mutant fly model, and focused on 33 TF mutants with no restriction on the TF class. The mutants used came from the same TF mutant collection as used here. Twelve mutants were common between this study and ours (see, “all-scores” sheet). Among these twelve mutants, none were found hypo-colonizer in the fly model and only one ( UGA3) was found hypo-colonizer in our pool experiment.
The use of two distinct infection models might explain the discrepancies between both studies. The work of Noble et al. Was more similar to our analysis since mutant strains of this study were barcoded and were infecting mice as pools. Nevertheless, our analyses were different since Noble et al. Did not target a specific class of genes and developed their own strategy of screening associated to their particular mutant collection. Consequently, out of the 77 Zn2-Cys6 TF mutants tested here, only 26 were common with the study of Noble et al.
(see, “all-scores” sheet). Three of the eight hypo-colonizers mutants, namely mutants for orf19.255 ( ZCF1), orf19.6182 ( ZCF34) and SEF1, were also tested in their study.
Only the mutant for SEF1 showed a hypo-colonizer phenotype in both studies (). Transcriptional analyses on iron metabolism in C. Albicans showed that SEF1 might be involved in iron uptake regulation. Interestingly, Chen et al. Demonstrated recently that SEF1, as observed here (), is critical for in vivo colonization. The SEF1 deletion was accompanied by a decreased virulence in the mice bloodstream infection model. Fortunately, none of the mutants found as hypo-colonizers by Noble et al.
Were missed using our strategy. Besides, none of the four mutants that were found hyper-colonizers were identified by Noble et al.. To our knowledge, the remaining four mutants found here as hypo-colonizer including mutants for orf19.5133 ( ZCF29), orf19.3405 ( ZCF18), orf19.2646 ( ZCF13) and RGT1 have not yet been identified in studies addressing virulence or organ colonization in animal studies. In vitro phenotypes of C. Albicans Zn2-Cys6 TF mutants with hypo- or hyper-colonization phenotype in a mice model of disseminated candidiasis. To further characterize the putative function of the Zn2-Cys6 TFs deleted in our study, we screened the whole collection for altered hyphae formation and colony morphology on serum, invasive growth and susceptibility to pH, heat and oxidative stresses.
These six phenotypic criteria are known to play a major role in C. Albicans pathogenesis (for review see ).
SEF1 mutant had an impaired growth at alkaline pH, which has already been described by Lan et al.. Only one mutant displayed altered susceptibility to H 2O 2. Interestingly, this mutant was TAC1 (orf19.3188), the main regulator of the expression of genes encoding transporters belonging to the ATP-binding cassette family, which are known to mediate azole resistance. Found that TAC1 is regulating some genes involved in oxidative stress response such as GPX1 and SOD5. Moreover several studies suggested that azole drugs are able to generate an oxidative stress through mitochondrion metabolism disturbance,,. Our observation further reinforces this hypothesis.
Lastly, only two mutants were hypersusceptible to heat stress, namely mutants for orf19.5849 ( CWT1) and orf19.2646 ( ZCF13). To our knowledge, and without any further investigations, no obvious correlation between impaired growth at 42°C and the putative functions of these TFs could be observed. Contrary to pH, oxidative and heat stresses, most of the phenotypes altered in our TF mutants were hyphae production in liquid medium supplemented with serum, morphological aspect of colonies, and invasive growth (, and ). Albicans, several parameters are known to induce hyphal formation (serum and temperature as example) however the mechanisms by which this yeast can sense contact with a surface and can promote invasion of soft substrates like agar medium or host tissue are still not fully understood. However, invasive growth and colony wrinkling are two partially independent mechanisms (or at least mechanisms mediated by different effectors) helping C. Albicans to cope with the requirements of its environment. Our results further demonstrate that agar invasion, colony wrinkling, and hyphae formation could not be correlated since a high discrepancy was observed while performing in vitro phenotypic screening of our TF mutants collection.
More than the hyphal form itself, the ability of C. Albicans to switch from yeast to hyphae or pseudohyphae is of crucial importance for its virulence.
Accordingly, no correlation between colony wrinkling and/or hyphae formation in liquid medium and reduced organs colonization could be made from our results (). Indeed, only four of our hypo-colonizer mutants (orf19.5133, orf19.2747, orf19.3753, and orf19.1685) have a reduced hyphae production. Likewise, only one hypo-colonizer mutant lost its capacity to invade agar (orf19.2747), and only two (orf19.5133 and orf19.1685) displayed smooth colonies. Even mutant for orf19.2646 and orf19.3405, although compromized in organ colonization, produced highly wrinkled colonies on serum-supplemented media and a larger amount of hyphae in the presence of serum, respectively, as compared to the wild type (). Likewise, no correlation between increased filamentation on serum and colonization ability could be evidenced since all but one (orf19.6781) of the hyper-colonizer mutants produce highly wrinkled colonies on serum-supplemented media ( and ).
This enhanced ability to produce hyphae may explain partially its higher ability to colonize mice tissues. Venn diagram of C.
Albicans Zn2-Cys6 TF mutants with an in vitro and/or an in vivo phenotype. One unexpected observation was the poor correlation between decreased invasive growth and hypo-colonizer phenotypes as only two hypo-colonizer TF mutants (for RGT1 and CTA4) displayed a decreased agar invasion ( and ). This result may appear surprising since it is well established that invasion ability is a crucial determinant for the capacity of C. Albicans to colonize host tissues. RGT1 was previously extensively analysed in vitro for its role in sugar transport and metabolism,, thus the decreased colonization ability of RGT1 mutant might not be only linked to its decreased invasion capacity but probably to a combination with sugar metabolism. The CTA4 mutant, which also produced smooth colonies on serum-supplemented media and no hyphae in liquid media, has been described with a decreased virulence by Chiranand et al..
Even if virulence and colonization are two distinct mechanisms of pathogenesis, it would be interesting to address whether this mutant also exhibits decreased virulence in the mice model of disseminated infection. In conclusion, the reduced or enhanced ability of most of the identified mutants to colonize mice kidneys could be at least partially explained by their in vitro phenotype (). However, a systematic relationship between criteria investigated here and colonization ability could be ruled out from our results. Indeed, among the 39 TF mutants positively selected from our in vitro test, only 7 had a modified colonization capacity in mice as compared to their wild type strains (). Moreover, five of these twelve TF mutants did not have any in vitro phenotype, suggesting that their increased or decreased colonization ability could be attributed to alternative virulence traits. Finally, to validate results obtained with hypo-colonizer mutants, single strain infections were performed with the seven hypo-colonizer mutants and the wild type strain.
Even if all mutants showed decreased colonization as compared to wild type, only two mutants for orf19.2646 ( ZCF13) and orf19.3405 ( ZCF18) showed significant reduction of colonization by CFU counting in kidneys as compared to the wild type. Surprisingly, these two strains did not show the lowest colonization scores in pool experiments. This result can be explained by a “pool effect”, which might influence the colonization of each strain of the pool. Competition between strains can modify the behavior of a strain in mice organs as compared to single strain infection. Our in vitro analysis showed that the mutant for orf19.3405 ( ZCF18) is slightly growth-deficient, and thus might contribute to its hypo-colonizer phenotype.
Regarding mutant for orf19.2646, we observed an increased ability to filament and to invade the agar and a growth deficiency at 42°C. The phenotypes observed in vitro and in vivo were reverted by the re-introduction of a wild-type allele in the mutant strain confirming that phenotypes observed were effectively due to the orf19.2646 ( ZCF13) mutation.
Orf19.2646 ( ZCF13) therefore represents a very interesting candidate for further analyses. The further characterization of the function of this TF in virulence and identification of its target genes may constitute promising basis for the better understanding of host- Candida interactions. In addition, analysis of the remaining mutants of our collection may reveal other interesting candidates for involvement in colonization and virulence. Overall, our study demonstrates that our original strategy is a powerful tool to detect both in vivo and in vitro competitive fitness.
This last in vitro approach is currently used in our laboratory. Moreover, this detection system is easy to implement, could be adapted to any pre-existing mutants and its efficacy could be enhanced by the use of additional STM tags. Supplementary S1 This file contains five sheets. The “Zn2Cys6 TF” sheet describes the mutant collection and provides for each mutant ORF number, gene name (when available) and description according to the Candida Genome Database assembly #21.
It also provides information for inactivation strategy and numbering of barcoded strains (BCY). The “Phenotype” sheet describes the phenotype that was observed in vitro and a summary of the already described phenotypes for each mutant. The “All scores” sheet gives scores obtained for each replicate of in vivo experiments as well as the mean score for each mutant (see also ).
It also gives information regarding previous large scale in vivo studies. The “BWP17 pilot assay” gives scores obtained for each replicate of the first in vivo pilot assay performed with the 10 BWP17 barcoded strains. Results are presented by STM.
The “Supplementary references” sheet lists the references used in the “Phenotype” sheet. Supplementary S6 Raw pictures of additional in vitro phenotypic screen performed for morphology determination. Each TF mutant is referenced according to its barcoded strain number (see ), and has to be compared with its respective parental strain (barcoded strain number 11 for CAF4-2 based TF mutants and barcoded strain number 31 for BWP17 based TF mutants) which was spotted on each plate for reference. This was performed on YEPD agar plates supplemented with 10% of serum after 24 and 72 h of incubation at 35°C. Supplementary S7 Raw pictures of in vitro phenotypic screen for TF mutants carrying the barcoded strain numbers BCY48, BCY122, BCY124, BCY126, BCY166 and BCY164. These strains were obtained in a second time during the course of this study and therefore assayed independently from the main screen.
Each TF mutant is referenced according to its barcoded strain number (see ), and has to be compared with the parental strain (barcoded strain number 31) which was spotted on each plate for reference. How To Install Asus Express Gate Cloud.