## Exploiting Spatial Structure for Improved False Discovery Rate: A Review of smoothfdr

In the field of statistical analysis, multiple-testing problems often arise when simultaneously testing a large number of hypotheses. One common measure to control the rate of false positives is the False Discovery Rate (FDR), which limits the proportion of false discoveries among all discoveries. However, existing FDR-controlling methods may fail to detect spatially localized regions of significant test statistics, leading to false-negative or less biologically plausible results.

The `smoothfdr`

package provides an empirical-Bayes method that addresses this limitation by exploiting the spatial structure inherent in the data. By automatically identifying spatially localized regions of significance and adjusting the statistical significance threshold accordingly, FDR smoothing enhances the power to detect signals and separate them from noise. This article aims to review the key features of the `smoothfdr`

package and its potential applications.

### Installation and Getting Started

To start utilizing the `smoothfdr`

package, the Python version can be easily installed using pip:

`pip install smoothfdr`

Once installed, you can run the tool directly from the terminal by entering the command `smoothfdr`

. Alternatively, if you want to integrate it into your code, simply import the `smoothfdr`

module.

### Running an Example

To illustrate the benefits of FDR smoothing, let’s consider a simple example using the provided dataset `data.csv`

with a size of 128×128. This dataset contains two plateaus of increased prior probability of a signal. To perform FDR smoothing on this dataset, you can use the following code:

“`python

import numpy as np

from smoothfdr.easy import smooth_fdr

data = np.loadtxt(‘example/data.csv’, delimiter=’,’)

fdr_level = 0.05

results = smooth_fdr(data, fdr_level, verbose=5, missing_val=0)

“`

By default, the `smooth_fdr`

function assumes a multidimensional grid that has the same shape as the `data`

array. However, if your points are connected differently, you can transform `data`

into a one-dimensional vector and pass a list of `(x1, x2)`

pairs via the `edges`

parameter. Additionally, if your grid has missing data points (e.g., in an fMRI scan), you can specify the value used to indicate missing data with the `missing_val`

parameter.

### Visualizing the Results

Once you have run the FDR smoothing algorithm, you can analyze the results using the returned dictionary. For visualization purposes, you can plot the raw data, the smoothed prior, the posteriors, and the discoveries at the specified FDR level.

“`python

import matplotlib.pylab as plt

fig, ax = plt.subplots(2, 2)

ax[0, 0].imshow(data, cmap=’gray_r’)

ax[0, 0].set_title(‘Raw data’)

ax[0, 1].imshow(results[‘priors’], cmap=’gray_r’, vmin=0, vmax=1)

ax[0, 1].set_title(‘Smoothed prior’)

ax[1, 0].imshow(results[‘posteriors’], cmap=’gray_r’, vmin=0, vmax=1)

ax[1, 0].set_title(‘Posteriors’)

ax[1, 1].imshow(results[‘discoveries’], cmap=’gray_r’, vmin=0, vmax=1)

ax[1, 1].set_title(‘Discoveries at FDR={0}’.format(fdr_level))

plt.savefig(‘results.png’)

“`

The generated image provides a visual representation of the FDR smoothing results, allowing you to observe the spatial separation of signals from noise.

### Conclusion and Further Reading

The `smoothfdr`

package introduces an empirical-Bayes method for exploiting spatial structure in large multiple-testing problems. By incorporating FDR smoothing into statistical analysis workflows, researchers and data scientists can enhance the power to detect signals and achieve more accurate and biologically plausible results.

For more in-depth understanding and detailed algorithmic information, you can refer to the paper on arXiv titled “False Discovery Rate Smoothing” by W. Tansey, O. Koyejo, R. A. Poldrack, and J. G. Scott.

In summary, the `smoothfdr`

package offers a valuable tool for data smoothing in the context of false discovery rate control. Its easy installation process, customizable parameters, and visualization capabilities make it accessible to a wide range of stakeholders in various fields of research and data analysis.

## Leave a Reply