When analyzing relationships between variables in PostgreSQL, many turn to the Pearson correlation coefficient (corr(X, Y)). While useful, it only captures linear relationships. Many real-world datasets contain relationships that aren’t strictly linear. Consider cases like:

  • Diminishing returns in economics (e.g., increasing ad spend vs. revenue growth)
  • Biological or medical data with exponential growth patterns
  • Logarithmic or polynomial trends in machine learning features

In these scenarios, corr(X, Y) might report a weak correlation. This is where the xi (ξ) correlation coefficient comes into play: Unlike Pearson’s correlation, Xi Correlation can detect both linear and nonlinear functional relationships between variables.

For more information on XI, see the original paper. A New Coefficient of Correlation S. Chatterjee (2020)

pgxicor is a PostgreSQL extension that introduces the Xi Correlation (xicor(X, Y)) aggregate function.

CREATE EXTENSION xicor; 
       
CREATE TABLE xicor_test (x float8, y float8);
INSERT INTO xicor_test (x, y)
VALUES
    (1.0, 2.0),
    (2.5, 3.5),
    (3.0, 4.0),
    (4.5, 5.5),
    (5.0, 6.0);

-- Query to calculate the Xi correlation using the aggregate function
SELECT xicor(x, y) 
FROM xicor_test;

I’ve published this extension on GitHub on Florents-Tselai/pgxicor.

If you’re interested in this, also check out vasco; Another similar extension is based on the Maximal Information Coefficient (MIC). A standalone C implementation of ξ is also available libxicor.