I have a variance-covariance matrix as part of my optimization problem, and I want to get the Cholesky factor. The parameters that I am trying to estimate in the optimization are standard deviations of 7 variables and the correlation coefficient between the 7 variables.
# The first eight values are irrevalent (which is another part for my optimization problem).
# These values are initiated by scipy.optimize.differential_evolution.
params = np.array([ 0.7806441 , 0.43681974, 0.44591321, 0.40755392, 0.95581486,
0.45218698, 0.72242087, 0.10477314, 0.49510088, 0.46714018,
0.06582374, 0.11473761, 0.07053657, 0.4480119 , 0.47490398,
-0.58165201, 0.80810424, -0.42242146, 0.16330766, 0.88495337,
-0.73821011, 0.82768099, 0.66367593, -0.84882425, 0.51292046,
-0.78708664, -0.97173908, 0.5514854 , -0.1384802 , -0.00899825,
0.55532738, 0.4374165 , 0.63025894, 0.75539676, -0.08648026,
0.40995866])
# standard errors params[8:15], bounded between 0 and 0.5
# because the variable is between 0 and 1.
# I have 7 variables, so here each element in 'dev' is the sd of one variable.
dev = np.diag(params[8:15])
# correlation coefficients params[15:35], bounded between -1 and 1.
# params[15:21]: rho11, rho12, ... rho17.
corr1 = np.insert(params[15:21],0,1.0)
# params[21:26]: rho23, rho24, ... rho27.
corr2 = np.insert(np.concatenate((np.zeros(1), params[21:26])), 1, 1.0)
corr3 = np.insert(np.concatenate((np.zeros(2), params[26:30])), 2, 1.0)
corr4 = np.insert(np.concatenate((np.zeros(3), params[30:33])), 3, 1.0)
corr5 = np.insert(np.concatenate((np.zeros(4), params[33:35])), 4, 1.0)
corr6 = np.insert(np.concatenate((np.zeros(5), params[35:36])), 5, 1.0)
corr7 = np.append(np.zeros(6), 1.0)
X=np.vstack((corr1,corr2,corr3,corr4,corr5,corr6,corr7))
X = X + X.T - np.diag(np.diag(X))
cov = dev.dot(X).dot(dev)
L = np.linalg.cholesky(cov)
Traceback (most recent call last):
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 1, in <module>
File "/Library/Python/3.9/site-packages/numpy/linalg/linalg.py", line 779, in cholesky
r = gufunc(a, signature=signature, extobj=extobj)
File "/Library/Python/3.9/site-packages/numpy/linalg/linalg.py", line 115, in _raise_linalgerror_nonposdef
raise LinAlgError("Matrix is not positive definite")
numpy.linalg.LinAlgError: Matrix is not positive definite
# check egienvalues
print(np.linalg.eigvalsh(cov))
[-0.22219591 -0.00343992 0.00541225 0.0116191 0.27866962 0.38321562
0.45878541]
Why does my variance-covariance matrix cov has negative egienvalues? Where did I do wrong?
Since any covariance matrix is guaranteed to be positive definite, I went looking for ways in which this covariance matrix contained some kind of mistake or internal contradiction.
The first thing I looked at was this line:
Although this line looked suspicious to me, I coded an alternative implementation which multiplies standard error along rows and columns, and it agrees with this line.
The second thing I looked at was the correlation coefficient matrix. Although correlation is not, in general, transitive, for sufficiently strong correlation, it does obey some transitive properties.
Using the inequality in the linked question, I looked for contradictions among the correlation coefficients. Here is the correlation matrix, X, before multiplying in the standard deviations.
In this matrix, the correlation between 1 and 2 is 0.82768099. The correlation between 2 and 0 is 0.80810424. By the inequality in the linked question, the correlation between 0 and 1 must be positive. But it's negative. The program I wrote finds many other problems like this one. This set of correlations is not possible, and the result is that the covariance matrix is not possible either.
Code: