| Title: | Segmentation Approaches in Chemometrics |
|---|---|
| Description: | Evaluation of prediction performance of smaller regions of spectra for Chemometrics. Segmentation of spectra, evolving dimensions regions and sliding windows as selection methods. Election of the best model among those computed based on error metrics. Chen et al.(2017) <doi:10.1007/s00216-017-0218-9>. |
| Authors: | Elia Gonzato [aut, cre, cph] |
| Maintainer: | Elia Gonzato <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-06-04 08:00:49 UTC |
| Source: | https://github.com/egonzato/windows.pls |
The beer dataset contains 60 samples published by Norgaard et al. Recorded with a 30mm quartz cell on the undiluted degassed beer and measured from 1100 to 2250 nm (576 data points) in steps of 2 nm. A good playing ground for regression methods starting from spectral intensities.
beerbeer
beerA data frame with 80 rows and 577 columns:
Original extract concentration
Intesities measured on 576 different data points
https://www.kaggle.com/datasets/robertoschimmenti/beer-nir?resource=download
Norgaard, L., Saudland, A., Wagner, J., Nielsen, J. P., Munck, L., & Engelsen, S. B. (2000). Interval partial least-squares regression (iPLS): a comparative chemometric study with an example from near-infrared spectroscopy. Applied Spectroscopy, 54(3), 413–419. Adapted from a R dataset available as part of the OHPL package (https://search.r-project.org/CRAN/refmans/OHPL/html/00Index.html).
Turns wavelengths into variable's names
convert.names.wl(start = NULL, stop = NULL, step = 2)convert.names.wl(start = NULL, stop = NULL, step = 2)
start |
First wavelength of the spectra. |
stop |
Last wavelength of the spectra. |
step |
Distance between each recorded wavelength. |
Returns vector with syntactically valid names for each wavelength
data(beer) X=beer[,2:ncol(beer)] head(names(X)) names(X)=convert.names.wl(1100,2250,2) head(names(X))data(beer) X=beer[,2:ncol(beer)] head(names(X)) names(X)=convert.names.wl(1100,2250,2) head(names(X))
Computes and stores cross-validation metrics for one of the three possible modes ‘wpls’, ‘epls’, ‘swpls’.
cv.wpls( xblock = NULL, yblock = NULL, windows = 3, window.size = 30, increment = 10, cv = 10, scale = FALSE, ncp = 10, mode = "wpls" )cv.wpls( xblock = NULL, yblock = NULL, windows = 3, window.size = 30, increment = 10, cv = 10, scale = FALSE, ncp = 10, mode = "wpls" )
xblock |
A matrix containing one spectra for each observation. |
yblock |
A vector containing the concentration associated to each spectra in the xblock matrix. |
windows |
Parameter used when either ‘wpls’ or ‘ewpls’ is chosen. Points out how many windows the user wants to divide the spectra in. |
window.size |
Parameter used when ‘swpls’ is chosen. Indicates the width of the window that slides along the spectra. |
increment |
Parameter used when ‘swpls’ is chosen. Indicates how many steps the window slides forward. |
cv |
Number of segments used for cross-validation. |
scale |
logical, asks to perform standardization. |
ncp |
Maximum number of principal components to be computed for each model. |
mode |
'wpls','ewpls' or 'swpls', see Details for more. |
NIR and Vis-NIR technologies are used to obtain spectra which might contain helpful information about the content of the samples the user is investigating. Since this method has been combined with multivariate statistical methods, researchers have been questioning the importance of using spectra in its entirety or if it might be a better solution to divide it in smaller regions which can guarantee higher performance in terms of predictions. Several methods have been proposed, from selecting only some regions to selecting combinations of those which are performing the best. This function provides three possibilities:
‘wpls’, which stands for Window PLS, divides the original spectra into several windows, computes PLS and stores metrics of interest such as RMSE and R2 for calibration and cross-validation both.
‘ewpls’, which stands for Evolving Window PLS, divides the original spectra into several windows, but each new window incorporates the previous ones, so that we are comparing smaller windows with the entire spectra.
’swpls’, which stands for Sliding Window Window PLS, ,asks the width of the window that will be used to compute the model and the step that the window will make forward in the spectra so that a new model is calculated. In this way the window slides along spectra and computes several models, which will be compared with metrics.
This function proposes a simpler version of iPLS, that can be found in the mdatools package, which divides the spectra in smaller segments and tries to find the combination with the lowest RMSE in cross-validation.
Returns a list containing:
xblock |
Matrix containing spectra used to train the model. |
yblock |
Vector containing values of the dependent variable. |
cal |
List containing RMSE and R2 of calibratrion. |
cv |
List containing RMSE and R2 of cross-validation. |
ncp |
Number of components used to compute the model. |
scale |
Contains logical condition used for standardization. |
cv.segment |
Number of segments used for cross-validation. |
Chen, J., Yin, Z., Tang, Y. et al. Vis-NIR spectroscopy with moving-window PLS method applied to rapid analysis of whole blood viscosity. Anal Bioanal Chem 409, 2737–2745 (2017).
Y.P. Du, Y.Z. Liang, J.H. Jiang, R.J. Berry, Y. Ozaki, Spectral regions selection to improve prediction ability of PLS models by changeable size moving window partial least squares and searching combination moving window partial least squares, Analytica Chimica Acta, Volume 501, Issue 2, 2004, Pages 183-191,
mdatools package, https://github.com/svkucheryavski/mdatools
data(beer) conc=beer[,1] sp=beer[,2:ncol(beer)] names(sp)=convert.names.wl(1100,2250,2) conc=unlist(conc) mywpls=cv.wpls(sp, conc,mode='wpls', windows = 5)data(beer) conc=beer[,1] sp=beer[,2:ncol(beer)] names(sp)=convert.names.wl(1100,2250,2) conc=unlist(conc) mywpls=cv.wpls(sp, conc,mode='wpls', windows = 5)
Plots in a single window the R2 of each model.
global.r2( wpls = NULL, col.cal = "blue", col.cv = "red", col.strip.background = "orange", xlab = NULL, ylab = NULL, title = NULL )global.r2( wpls = NULL, col.cal = "blue", col.cv = "red", col.strip.background = "orange", xlab = NULL, ylab = NULL, title = NULL )
wpls |
object obtained from cv.wpls. |
col.cal |
color for the calibration line. |
col.cv |
color for the cross-validation line. |
col.strip.background |
color of the banner for each window. |
xlab |
title of the x axis. |
ylab |
title of the y axis. |
title |
title of the plot. |
Plot of R2 of each spectra region used to compute PLS.
data(beer) conc=beer[,1] sp=beer[,2:ncol(beer)] names(sp)=convert.names.wl(1100,2250,2) conc=unlist(conc) mywpls=cv.wpls(sp, conc,mode='wpls', windows = 5) global.r2(mywpls,col.cal='navy', col.cv='red', col.strip.background='orange', xlab='Component', ylab=expression(R^2))data(beer) conc=beer[,1] sp=beer[,2:ncol(beer)] names(sp)=convert.names.wl(1100,2250,2) conc=unlist(conc) mywpls=cv.wpls(sp, conc,mode='wpls', windows = 5) global.r2(mywpls,col.cal='navy', col.cv='red', col.strip.background='orange', xlab='Component', ylab=expression(R^2))
Plots in a single window the RMSE of each model.
global.rmse( wpls = NULL, col.cal = "blue", col.cv = "red", col.strip.background = "steelblue", xlab = NULL, ylab = NULL, title = NULL )global.rmse( wpls = NULL, col.cal = "blue", col.cv = "red", col.strip.background = "steelblue", xlab = NULL, ylab = NULL, title = NULL )
wpls |
object obtained from cv.wpls. |
col.cal |
color for the calibration line. |
col.cv |
color for the cross-validation line. |
col.strip.background |
color of the banner for each window. |
xlab |
title of the x axis. |
ylab |
title of the y axis. |
title |
title of the plot. |
Plot of RMSE of each spectra region used to compute PLS.
data(beer) conc=beer[,1] sp=beer[,2:ncol(beer)] names(sp)=convert.names.wl(1100,2250,2) conc=unlist(conc) mywpls=cv.wpls(sp, conc,mode='wpls', windows = 5) global.rmse(mywpls,col.cal='navy', col.cv='red', col.strip.background='orange', xlab='Component', ylab='RMSE')data(beer) conc=beer[,1] sp=beer[,2:ncol(beer)] names(sp)=convert.names.wl(1100,2250,2) conc=unlist(conc) mywpls=cv.wpls(sp, conc,mode='wpls', windows = 5) global.rmse(mywpls,col.cal='navy', col.cv='red', col.strip.background='orange', xlab='Component', ylab='RMSE')
Plots spectra highlighting windows with the best performance.
map.best.window( wpls = NULL, fade = 0.7, col.window = "steelblue", xlab = "Wavelengths", ylab = "Absorbance", title = NULL, legend = NULL )map.best.window( wpls = NULL, fade = 0.7, col.window = "steelblue", xlab = "Wavelengths", ylab = "Absorbance", title = NULL, legend = NULL )
wpls |
object obtained from cv.wpls. |
fade |
opacity of the window. |
col.window |
color of the window that highlights the region. |
xlab |
title of the x axis. |
ylab |
title of the y axis. |
title |
title of the plot. |
legend |
description description |
Plot of the spectra with a window that highlights the region with the lowest cross-validation error.
data(beer) conc=beer[,1] sp=beer[,2:ncol(beer)] names(sp)=convert.names.wl(1100,2250,2) conc=unlist(conc) mywpls=cv.wpls(sp, conc,mode='wpls', windows = 5) map.best.window(mywpls)data(beer) conc=beer[,1] sp=beer[,2:ncol(beer)] names(sp)=convert.names.wl(1100,2250,2) conc=unlist(conc) mywpls=cv.wpls(sp, conc,mode='wpls', windows = 5) map.best.window(mywpls)
Colors and plots each spectra based on the associated concentration of the outcome variable
map.spectra.gradient( xblock = NULL, yblock = NULL, legend.title = "Gradient", plot.title = "Spectra with gradient based on Y variable", xlab = "Wavelength", ylab = "Absorbance", grad = 10, l.width = 0.75, col.legend = NULL )map.spectra.gradient( xblock = NULL, yblock = NULL, legend.title = "Gradient", plot.title = "Spectra with gradient based on Y variable", xlab = "Wavelength", ylab = "Absorbance", grad = 10, l.width = 0.75, col.legend = NULL )
xblock |
A matrix containing one spectra for each observation. |
yblock |
A vector containing the concentration associated to each spectra in the xblock matrix. |
legend.title |
Title of the legend which displays the gradient. |
plot.title |
Title of the plot. |
xlab |
Title of the x axis. |
ylab |
Title of the y axis. |
grad |
Number of colors for the gradient's palette. |
l.width |
Width of each spectra. |
col.legend |
Deletes presence of the legend. |
Plot with spectra of all observations, mapped with the intensity of the associated concentration.
data(beer) X=beer[,2:ncol(beer)] names(X)=convert.names.wl(1100,2250,2) Y=unlist(beer[,1]) map.spectra.gradient(X,Y)data(beer) X=beer[,2:ncol(beer)] names(X)=convert.names.wl(1100,2250,2) Y=unlist(beer[,1]) map.spectra.gradient(X,Y)
Plots R2 of calibration and cross-validation of a single nindow.
r2.single.window( wpls = NULL, condition = "Complete", shape.cal = 19, shape.cv = 19, width = 1, size = 2, col.cal = "blue", col.cv = "red", xaxis.title = "Component", yaxis.title = expression(R^2), title = paste("Plot of R2 for the", condition, "model"), legend.name = NULL, x.legend = 0.9, y.legend = 0.2 )r2.single.window( wpls = NULL, condition = "Complete", shape.cal = 19, shape.cv = 19, width = 1, size = 2, col.cal = "blue", col.cv = "red", xaxis.title = "Component", yaxis.title = expression(R^2), title = paste("Plot of R2 for the", condition, "model"), legend.name = NULL, x.legend = 0.9, y.legend = 0.2 )
wpls |
object obtained from cv.wpls. |
condition |
name of the Window the user wants to plot. |
shape.cal |
shape of the point of the calibration line. |
shape.cv |
shape of the point of the cross-validation line. |
width |
width of the line. |
size |
size of the points of calibration and cross-validation. |
col.cal |
color for the calibration line. |
col.cv |
color for the cross-validation line. |
xaxis.title |
title of the x axis. |
yaxis.title |
title of the y axis. |
title |
title of the plot. |
legend.name |
displays legend and its name. |
x.legend |
position of the legend on the x axis, ranges from 0 to 1. |
y.legend |
position of the legend on the y axis, ranges from 0 to 1. |
Plot of R2 of the region requested by the user.
data(beer) conc=beer[,1] sp=beer[,2:ncol(beer)] names(sp)=convert.names.wl(1100,2250,2) conc=unlist(conc) mywpls=cv.wpls(sp, conc,mode='wpls', windows = 5) r2.single.window(mywpls,'Window2')data(beer) conc=beer[,1] sp=beer[,2:ncol(beer)] names(sp)=convert.names.wl(1100,2250,2) conc=unlist(conc) mywpls=cv.wpls(sp, conc,mode='wpls', windows = 5) r2.single.window(mywpls,'Window2')
Plots RMSE of calibration and cross-validation of a single wnindow.
rmse.single.window( wpls = NULL, condition = "Complete", shape.cal = 19, shape.cv = 19, width = 1, size = 2, col.cal = "blue", col.cv = "red", xaxis.title = "Component", yaxis.title = "RMSE", title = paste("Plot of RMSE for the", condition, "model"), legend.name = NULL, x.legend = 0.1, y.legend = 0.2 )rmse.single.window( wpls = NULL, condition = "Complete", shape.cal = 19, shape.cv = 19, width = 1, size = 2, col.cal = "blue", col.cv = "red", xaxis.title = "Component", yaxis.title = "RMSE", title = paste("Plot of RMSE for the", condition, "model"), legend.name = NULL, x.legend = 0.1, y.legend = 0.2 )
wpls |
object obtained from cv.wpls. |
condition |
name of the Window the user wants to plot. |
shape.cal |
shape of the point of the calibration line. |
shape.cv |
shape of the point of the cross-validation line. |
width |
width of the line. |
size |
size of the points of calibration and cross-validation. |
col.cal |
color for the calibration line. |
col.cv |
color for the cross-validation line. |
xaxis.title |
title of the x axis. |
yaxis.title |
title of the y axis. |
title |
title of the plot. |
legend.name |
displays legend and its name. |
x.legend |
position of the legend on the x axis, ranges from 0 to 1. |
y.legend |
position of the legend on the y axis, ranges from 0 to 1. |
Plot of RMSE of the region requested by the user.
data(beer) conc=unlist(beer[,1]) sp=beer[,2:ncol(beer)] names(sp)=convert.names.wl(1100,2250,2) mywpls=cv.wpls(sp, conc,mode='wpls', windows = 5) rmse.single.window(mywpls,'Window2')data(beer) conc=unlist(beer[,1]) sp=beer[,2:ncol(beer)] names(sp)=convert.names.wl(1100,2250,2) mywpls=cv.wpls(sp, conc,mode='wpls', windows = 5) rmse.single.window(mywpls,'Window2')
Displays how spectra are divided in windows
segment.windows( xblock = NULL, yblock = NULL, windows = 3, fade = 0.3, xlab = "Wavelength", ylab = "Absorbance", title = paste("Spectra divided in", windows, "segments", sep = " "), legend = NULL, grad = 10 )segment.windows( xblock = NULL, yblock = NULL, windows = 3, fade = 0.3, xlab = "Wavelength", ylab = "Absorbance", title = paste("Spectra divided in", windows, "segments", sep = " "), legend = NULL, grad = 10 )
xblock |
A matrix containing one spectra for each observation. |
yblock |
A vector containing the concentration associated to each spectra in the xblock matrix. |
windows |
Number of windows the spectra has to be divided in. |
fade |
Opacity of the window. |
xlab |
Title of the x axis. |
ylab |
Title of the y axis. |
title |
Title of the plot. |
legend |
Name of the substance which drives the gradient of spectra’s mapping. |
grad |
Number of colors that are used to build the gradient. |
Plot of spectra in which segments have a different background color.
data(beer) conc=unlist(beer[,1]) sp=beer[,2:ncol(beer)] names(sp)=convert.names.wl(1100,2250,2) segment.windows(sp,conc,windows=7,fade=0.25)data(beer) conc=unlist(beer[,1]) sp=beer[,2:ncol(beer)] names(sp)=convert.names.wl(1100,2250,2) segment.windows(sp,conc,windows=7,fade=0.25)
Takes as input the object containing metrics of the several models computed with cv.wpls and selects the best basing on the lowest RMSE available; then computes PLS and gives as output an object containing results.
sel.best.window(wpls = NULL)sel.best.window(wpls = NULL)
wpls |
object obtained from cv.wpls. |
An object containing results of the best model. Has the same content of a model obtained from the function pls of mdatools.
data(beer) conc=beer[,1] sp=beer[,2:ncol(beer)] names(sp)=convert.names.wl(1100,2250,2) conc=unlist(conc) mywpls=cv.wpls(sp, conc,mode='wpls', windows = 5) best.pls=sel.best.window(mywpls)data(beer) conc=beer[,1] sp=beer[,2:ncol(beer)] names(sp)=convert.names.wl(1100,2250,2) conc=unlist(conc) mywpls=cv.wpls(sp, conc,mode='wpls', windows = 5) best.pls=sel.best.window(mywpls)