Friday, June 7, 2013

CCP and Prostate Cancer

There is always an interest in determining the prognostic value of tumors and hopefully staging treatment. There has been a recent flurry of interest in using “cell cycle progression” CCP, gene testing, a method of taking gene products from biopsy samples and then using them to ascertain the most likely progression of the tumor.

We summarize here our opinions as stated in a recent White Paper.

CCP is a methodology proposed to do this. We take no position in this opinion paper regarding the efficacy of CCP as applied to PCa but we examine the original assertions in some detail. Conceptually it makes sense. It is as follows:

1. A handful of genes if over expressed, when combined with other metrics, can provide fairly accurate prognostic measures of PCa.

2. Selecting the genes can be accomplished in a variety of ways ranging from logical and clear pathway control genes such as PTEN to just a broad base sampling wherein the results have a statistically powerful predictive result.

3. Measuring the level of expression in some manner and from the measurements combine those in a reasonable fashion to determine a broad based metric.

4. Combining the gene expression metric with other variable to ascertain a stronger overall metric.

The CCP work to date has been focused somewhat on these objectives.

Let us now briefly update the work as detailed in the industry press. As indicated in a recent posting:[1]

.....initially measured the levels of expression of a total of 31 genes involved in CCP. They used these data to develop a predefined CCP “score” and then they set out to evaluate the value of the CCP score in predicting risk for progressive disease in the men who had undergone an RP or risk of prostate cancer-specific mortality in the men who had been diagnosed by a TURP and managed by watchful waiting.

Thus there seems to be a strong belief in the use of CCP, especially when combined with other measures such as PSA.

The CCP test has been commercialized as Prolaris by Myriad. In a Medscape posting they state[2]:

"PSA retained a fair amount of its predictive value, but the predictive value of the Gleason score "diminished" against the CCP score." he said. "Once you add the CCP score, there is little addition from the Gleason score, although there is some."

"Overall, the CCP score was a highly significant predictor of outcome in all of the studies,...it was the dominant predictor in all but 1 of the studies in the multivariate analyses, and typically a unit change in the score was associated with a remarkably similar 2- to 3-fold increase in either death from prostate cancer or biochemical recurrence, indicating that this is a very robust predictor, and seems to work in a whole range of circumstances."

Thus there is some belief that CCP when combined with other metrics has strong prognostic value.

In this analysis we use CCP as both an end and a means to an end. CCP is one of many possible metrics to ascertain prognostic values. There is a wealth of them. We thus start with the selection of genes. We first consider general issues and then apply them to the CCP approach. This is the area where we have the majority of our problems.
Let us first examine how they obtained the data. We shall follow the text of the 2011 paper and then comment accordingly.

1. Extract RNA

2. Treat the RNA with enzyme to generate cDNA

3. Collect the cDNA and confirm the generation of key entities.

4. Amplify the cDNA

5. Pre amplify the cDNA prior to measuring in an array.

7. In arrays record levels of expression

Clearly there may be many sources of noise or error in this approach, especially in recording the level of fluorescent intensity. The problem is however that at each step we have the possibility of measurement bias or error. These become additive and can substantially alter the data results.

In this section we consider the calculations needed to develop a reliable classifier. This is a long standing and classic problem. Simply stated:

“Assume you have N gene expression levels, G(i), and you desire to find some function g(G(1),…,G(N)) such that this function g divides the space created by the Gs into two regions, one with no disease progression and one with disease progression.”

Alternatively we could ask for a function f(G(1),…,G(N)) such that the probability of disease progression, or an end point of death in a defined period, is f or some function derived therefrom.

Let us begin with general classifiers. First let us review the process of collecting data. The general steps are above. We start with a specimen and we end up with N measurements of gene expression. In the CCP case we have some 31 genes we are examining and ascertaining their relative excess expression. Now as we had posed the problem above we are seeking a classifier to determine a function f or g as above which would either bifurcate the space of N genes or a function f from which we could ascertain survival based upon the N gene expression measurements.

Now from classic classifier analysis we can develop the two metrics; a simple bifurcating classifier and a probability estimator. The simple classifier generates a separation point, a line or plane as shown below, for which being below is benign and being above is problematic. This is akin to the simple PSA test of being above or below 4.0. However we all know that this has its problems. Thus there may be some validity in the approach for prognostic purposes. Clearly a high value indicates a significant chance for mortality, one assumes directly related to this disease.

Let us now examine the CCP index calculation in some detail. We use the 2011 paper as the source. The subsequent papers refer back to this and thus we rely upon what little is presented here. The approach we take herein is to use what the original paper stated and then line by line establish a mathematical model and where concerns or ambiguities we point them out for subsequent resolution. In our opinion the presentation of the quantitative model is seriously flawed in terms of its explanation and we shall show the basis of our opinion below.

We have provided a detailed examination in our recent White Paper. In our opinion there is a lack of transparency and reproducibility in the 2011 paper and thus one cannot utilize what is presented.

This area of investigation is of interest but it in my opinion raises more questions than posing answers. First is the issue of the calculation itself and its reproducibility. Second is the issue of the substantial noise inherent in the capture of the data.

1. Pathway Implications: Is this just another list of Genes?

The first concern is the fact that we know a great deal about ligands, receptors, pathway elements, and transcription factors. Why, one wonders, do we seem to totally neglect that source of information.

2. Noise Factors: The number of genes and the uncertainties in measurements raise serious concerns as to stability of outcomes.

Noise can be a severe detractor from the usefulness of the measurement. There are many sources of such noise especially in measuring the fluorescent intensity. One wonders how they factor into the analysis. Many others sources are also present from the PCR process and copy numbers to the very sampling and tissue integrity factors.

3. Severity of Prognosis and Basis: For a measurement which is predicting patient death one would expect total transparency.

The CCP discriminant argues for the most severe prognostication. Namely it dictates death based upon specific discriminant values. However as we have just noted, measurement noise can and most likely will provide significant uncertainty in the “true” value of the metric.
4. Flaws in the Calculation Process: Independent of the lack of apparent transparency, there appear in my opinion to be multiple points of confusion in the exposition of the methodology.

In our opinion, there are multiple deficiencies in the presentation of the desired calculation of the metric proposed which make it impossible to reproduce it. We detail them in our White Paper.

5. Discriminants, Classifiers, Probability Estimators: What are they really trying to do?

The classic question when one has N independent genes and when one can measure relative expression is how does one take that data and determine a discriminant function. All too often the intent is to determine a linear one dimensional discriminant. At the other extreme is a multidimensional non-linear discriminant. This is always the critical issue that has been a part of classifiers since the early 1950s. In the case considered herein there is little if any description of or justification of the method employed. One could assume that the authors are trying to obtain an estimate of the following:

P[Death in M months]=g(G1,...GN))

where Gk is the level of expression of one of the 31 genes. One would immediately ask; why and how? In fact we would be asked to estimate a Bayesian measure:

P[Death in M months|G1,...GN]

which states that we want the conditional probability. We know how to do this for systems but this appears at best to be some observational measure. This in my opinion is one of the weak points.


6. Causal Genes, where are they?

One of the major concerns is that one genes expression is caused by another gene. In this case of 31 genes there may be some causality and thus this may often skew results.

7. Which Cell?

One of the classic problems is measuring the right cell. Do we want the stem cell, if so how are they found. Do we want metastatic cells, then from where do we get them. Do we want just local biopsy cells, if so perhaps they under-express the facts.

8. Why this when we have so many others?

We have PSA, albeit with issues, we have SNPs, we have ligands, receptors, pathway elements, transcription factors, miRNAs and the list goes on. What is truly causal?

Basically this approach has possible merit. The problem, in my opinion, is the lack of transparency in the description of the test metric. Also the inherent noisy data is a concern in my opinion. Moreover one wonders why so much Press.


1.               Cooperberg, M., et al, Validation of a Cell-Cycle Progression Gene Panel to Improve! Risk Stratification in a Contemporary Prostatectomy Cohort! https://s3.amazonaws.com/myriad-library/Prolaris/UCSF+ASCO+GU.pdf
2.               Cooperberg, M., et al, Validation of a Cell-Cycle Progression Gene Panel to Improve Risk Stratification in a Contemporary Prostatectomy Cohort, JOURNAL OF CLINICAL ONCOLOGY, 2012.
3.               Cuzick J., et al, Prognostic value of a cell cycle progression signature for prostate cancer death in a conservatively managed needle biopsy cohort, British Journal of Cancer (2012) 106, 1095 – 1099.
4.               Cuzick, J., et al, Prognostic value of an RNA expression signature derived from cell cycle proliferation genes for recurrence and death from prostate cancer: A retrospective study in two cohorts, Lancet Oncol. 2011 March; 12(3): 245–255.
5.     Duda, R., et al, Pattern Classification, Wiley (New York) 2001.
6.     McGarty, T., Prostate Cancer Genomics, Draft 2, 2013, http://www.telmarc.com/Documents/Books/Prostate%20Cancer%20Systems%20Approach%2003.pdf
7.     Theodoridis, S., K., Koutroumbas, Pattern Recognition, AP (New York) 2009.