The opinion of the court was delivered by: Poritz, C.J.
On review of the Recommendations of Special Master.
This matter completes the second phase in a two-part review of the Court's proportionality review procedures undertaken pursuant to our Order in State v. Loftin, 157 N.J. 253, 454-55 (Loftin II), cert. denied, ___ U.S. ___, 120 S. Ct. 229, 145 L. Ed. 2d 193 (1999).
In Loftin II, the Court found that the proportionality review methodologies we had been using were seriously flawed and that the time had come for a "careful reconsideration" of our approach to the proportionality phase of death penalty review. Id. at 286. We established a process for reconsideration that included the designation of a Special Master and the submission of a report to the Court covering "four discrete areas of concern: the size of the universe of comparison cases; particular issues in respect of individual proportionality review; questions relating to the statistical models used in both individual and systemic proportionality review; and the status of proportionality review as a separate proceeding in death penalty appeals." Ibid. The first phase of this project was completed on April 28, 1999, when the Honorable David S. Baime, a Presiding Judge of the Appellate Division appointed Special Master for the Supreme Court, submitted an initial report wherein he made findings and recommendations regarding the size of the universe, individual proportionality review and the models used for that purpose, and the feasibility of consolidating direct death penalty appeals with proportionality review. See David S. Baime, Report to the New Jersey Supreme Court: Proportionality Review Project (Apr. 28, 1999) (Baime Report I); see also In re Proportionality Review Project, 161 N.J. 71 (1999) (Proportionality Review I) (adopting Baime Report I with modifications).
Upon our consideration of Baime Report I, this Court issued determinations regarding those issues in August 1999, thus establishing baseline procedures for individual proportionality review.
On December 1, 1999, Special Master Baime issued Report to the New Jersey Supreme Court: Systemic Proportionality Review Project (Dec. 1, 1999) (Baime Report II). Baime Report II, as its name suggests, "deals with questions pertaining to systemic proportionality review," that is, "whether ethnic, racial or gender bias exists in the administration of our capital sentencing laws." Id. at 1. After reviewing briefs submitted by the Attorney General, the Public Defender and amici curiae, Association of Criminal Defense Lawyers of New Jersey and New Jersey State Conference of NAACP Branches, and on hearing oral argument, we adopt Baime Report II with modifications as outlined in this opinion.
Eight years ago, we stated that "we can never dispense with the obligation to assure that the burden of the past does not create a genuine risk that defendants will be sentenced to death either because of their race or the race of the victim." State v. Marshall, 130 N.J. 109, 219 (1992) (Marshall II), cert. denied, 507 U.S. 929, 113 S. Ct. 1306, 112 L. Ed. 2d 694 (1993). We turned to "[p]roportionality review therefore [as] a means through which to monitor . . . and thereby to prevent any impermissible discrimination in [the] imposi[tion of] the death penalty." State v. Ramseur, 106 N.J. 123, 327 (1987); see also Loftin II, supra, 157 N.J. at 315 ("This Court is committed to a course of review that is capable of discerning possible racial discrimination in our capital sentencing system.").
Yet, the development of a sound methodology for the purpose of systemic proportionality review has proved an elusive goal. Loftin II, supra, 157 N.J. at 305-16. The first statistical models conceived by Professor David C. Baldus, the Special Master appointed to oversee the development of a proportionality review system for the Court, were not designed to test for racial discrimination, but rather were focused on whether defendants with roughly equivalent levels of culpability were treated similarly. See id. at 310. In his report, Baldus informed the Court that
race variables [were included] in the culpability models to ensure that variables for legitimate case characteristics were not carrying any possible race effects. It was in the course of this work that we observed the race effects . . . . Because discrimination was not the primary mandate in this project, we consider these results to be strictly preliminary. More work will be required to determine if they persist under closer scrutiny and alternative analyses, to determine, for example, whether they are statistical artifacts or flukes, and to assess their legal and practical significance. [David C. Baldus, Death Penalty Proportionality Review Project: Final Report to the New Jersey Supreme Court 100-01 (Sept. 24, 1991) (Baldus Report).]
Despite this disclaimer, defendants have alleged racial discrimination in the administration of the death penalty premised largely on the statistical models created by Professor Baldus.
Last term, after having conducted four proportionality reviews between 1992 and 1995, we evaluated the reliability of the multiple regression models Baldus had devised. *fn1 Loftin II, supra, 157 N.J. at 308-16; see also State v. DiFrisco, 142 N.J. 148 (1995) (DiFrisco III) (using Baldus-created models for proportionality review), cert. denied, 516 U.S. 1129, 116 S. Ct. 949, 133 L. Ed. 2d 873 (1996); State v. Martini, 139 N.J. 3 (1994) (Martini II) (same), cert. denied, 516 U.S. 875, 116 S. Ct. 203, 113 L. Ed. 2d 137 (1995); State v. Bey, 137 N.J. 334 (1994) (Bey IV) (same), cert. denied, 513 U.S. 1164, 115 S. Ct. 1131, 130 L. Ed. 2d 1093 (1995); Marshall II, supra (same). We acknowledged that the models contained fundamental problems that precluded reliance on their results and that those problems were due both to intrinsic and extrinsic factors, i.e., the structure or design of the models themselves and too few cases. Loftin II, supra, 157 N.J. at 310-15. We had asked retired Judge Richard Cohen, with the assistance of a world-renowned statistician, Dr. John W. Tukey, *fn2 to assess those factors and others, and to report back to the Court within a short period of time. Id. at 302-03. See generally Richard S. Cohen, Report to the Supreme Court of New Jersey (Jan. 27, 1997) (Cohen Report). Because the Public Defender claimed that the models demonstrated an impermissible race effect, understanding the nature and impact of the models' shortcomings was critical.
Judge Cohen informed us that the multiple regressions were inherently unreliable because they were not parsimonious, a requisite for a reliable regression. Loftin, supra, 157 N.J. at 310-11. "A statistical model with a relatively small number of well-crafted parameters is known as a parsimonious model." Id. at 311. The lack of parsimony in the Baldus models created a risk of overfitting, possibly resulting in the false attribution of effect to a variable. Id. at 311 n.13. In other words, considering the relatively small number of cases in the database, particularly death-sentenced cases, the regression models contained an excessive number of variables and, thus, statistical results suggesting racial discrimination may have reflected a methodological flaw rather than reality. Id. at 311. In addition to the problems related to the small size of the database, various data-coding decisions appeared to affect the model results in unintended and inappropriate ways. Id. at 311-12. Those concerns about the validity of the statistical models were substantiated by the disparate ratings of capitally-prosecuted defendants derived from the regressions and from culpability rankings produced by a survey of judges conducted for comparison purposes. Id. at 310. In short, the exploratory work of Judge Cohen, and the advice of Dr. Tukey, demonstrated the need for a more thorough consideration of parsimonious models, further exploration of other methodologies, and a good hard look at some of our earlier decisions about data-coding classifications.
Loftin II provides a detailed review of these matters. Id. at 302-16. Suffice it to say here that the need for further work led to a detailed charge to our present Special Master, Judge Baime, that included consideration of both individual and systemic proportionality review. Id. at 453-57. In addition to a review of data-coding choices and the projected size of the database over time, we specifically ordered that
(6) [t]he Special Master shall attempt to develop parsimonious statistical models for more reliable regression studies of race effect and shall consider whether the process of purging, i.e., the removal of the indirect effects of race from variables that appear to be unrelated to race, produces results that are useful; [and that]
(7) [t]he Special Master shall consider Special Master Cohen's recommendation, submitted in State v. Loftin, supra, that the Court appoint a panel of judges to perform periodic assessments of penalty-trial outcomes, along with the composition and mandate of such an independent judicial panel, as independent verification of the culpability ratings derived from the models. [Id. at 456.]
In Baime Report II, Judge Baime addressed those questions and others. With assistance from Professors David Weisburd *fn3 and Joseph Naus, *fn4 he considered new approaches to the study of systemic discrimination and made recommendations to the Court. We consider Judge Baime's systemic recommendations in this opinion.
Before we begin our discussion of the Special Master's report, however, some further reflection on systemic review is in order. The study of system-wide discrimination requires the use of statistical techniques in complex socio-political settings. The process is far more complicated than counting the number of defendants by race and the number of death penalties meted out, although it certainly includes such elementary comparative analyses. A myriad of discretionary decisions are made at every level in the system, and sorting out their relationship to the race of either defendants or victims is complex and difficult. We have learned that statistical modeling for that purpose is largely untested and that its usefulness is uncertain. The improvements and additions we approve today will need further review down the road. We make these choices because we know of no other means by which the relationship, if any, between race and the death penalty system in New Jersey may be reviewed. The importance of understanding whether racial discrimination infects our system of capital punishment requires that we make this effort.
Judge Baime recommends a process for monitoring the possible presence of racial discrimination in the administration of the death penalty. Since the Baldus models were not developed with this goal in mind, his proposal would "constitute our first systematic effort to develop a statistical framework devised for the specific purpose of analyzing the possibility of racial discrimination in New Jersey's capital punishment scheme." Baime Report II at 6. He explains in broad terms:
[T]he monitoring system I propose rests on the assumption that there is no single method that is sufficiently reliable to provide convincing evidence of a race effect in death penalty sentencing. I recommend a multifaceted approach consisting of bivariate analysis, regression studies, case exploratory analysis, and a precedent-seeking type review of the cases. [Id. at 36-37.]
We analyze each component of the proposed system in turn.
Judge Baime first describes a series of bivariate analyses as part of the "multifaceted" approach he suggests. In a bivariate analysis, there is only one independent variable. Here, because we are testing for the presence of racial discrimination, race is that single independent variable. See Romero v. City of Pomona, 665 F. Supp. 853, 859 (C.D. Ca. 1987). These analyses are designed then to test whether there are statistically significant differences based on the race of the defendant or the race of the victim in respect of the rate at which defendants eligible for the death penalty are sentenced to death, the rate at which those defendants are prosecuted capitally, and the rate at which juries sentence capitally-prosecuted defendants to death. See Baime Report II at 49-55.
We adopt Judge Baime's recommendation to include a series of bivariate analyses as part of our systemic review. *fn5 This approach allows us to observe racial distributions in capital sentencing based on the raw data and without consideration of any variables other than race and sentence. The comparison is simple and easily understood. Nevertheless, we recognize, as does Judge Baime, that the utility of bivariate analyses is limited. Bivariate analyses do not take other factors, such as each defendant's deathworthiness, into account, but rather consider only the unadjusted relationship between race and outcome , i.e., advancement to penalty trial or imposition of the death penalty. This inability to control for other factors could cause misleading results, including "false positives" or the attribution of race effects in instances where no race effects exist. Conversely, bivariate analyses could produce false negatives, in which case the analyses would not show race effects despite the presence of racial discrimination.
Because of its inability to control for nonracial factors, a series of bivariate analyses cannot be the only methodology we use to examine racial discrimination in capital sentencing. See David Weisburd and Joseph Naus, Report to Special Master Baime Concerning Systemic Proportionality Review 24 (Nov. 24, 1999) (Weisburd and Naus Report) (attached as Technical Appendix to Baime Report II) ("It is important to keep in mind at the outset, that it is not possible to develop a reliable monitoring system of race effects using only bivariate methods. Any bivariate relationship between race and death sentencing is likely to be confounded by other factors that also influence death penalty sentencing."). Despite its inherent limitations, we include bivariate analyses in the system we approve today because of their simplicity and because, as part of a multi-faceted approach, they may help to shed some light on any relationship between race and sentencing outcome.
Creating reliable multiple regression models has been the biggest challenge in systemic proportionality review. With the assistance of Professors Weisburd and Naus, Judge Baime proposes a methodology for the development of parsimonious multiple regressions that would be the second component in the monitoring system he recommends to the Court.
As discussed earlier, see ante at ___ (slip op. at 6-7), aside from coding and other like issues, the regression models developed by Baldus suffer from a fundamental defect -- a small number of cases and far too many variables "to achieve [even] a minimal degree of statistical reliability . . . ." Loftin II, supra, 157 N.J. at 311. Intuitively, we expect that many different variables influence the likelihood of receiving a capital sentence and so we want to include all of them in our model. We have learned, however, that too many factors and too few cases can result in a finding of race effect where it does not exist, a problem we noted in connection with bivariate analysis, and also can prevent accurate measurement of the true effects of each factor. Cohen Report at 27-28. Because our database contains only fifty-three death-sentenced cases, see Baime Report II at 53-55, multiple regressions using the imposition of a death sentence as the dependent variable must have a limited number of independent variables for there to be a reasonably parsimonious model. Loftin II, supra, 157 N.J. at 311. According to Judge Cohen and Dr. Tukey, given the number of cases progressing through our system, multiple regressions measuring the factors that are "designed to test racial bias should employ between five and ten parameters or variables . . . ." Ibid.; see, John W. Tukey, Report to the Special Master 5 (Jan 27. 1997).
Lack of parsimony was a principal reason why we abandoned the index-of-outcomes test previously used in individual proportionality review to measure defendants' culpability levels. See Proportionality Review I, supra, 161 N.J. at 91-96. The inability to design parsimonious regression models for individual proportionality review, however, does not necessarily prevent the development of parsimonious regression models for systemic ...