Class SpectralFeaturePipelines
- java.lang.Object
-
- com.tagtraum.audiokern.audioprocessor.SpectralFeaturePipelines
-
public final class SpectralFeaturePipelines extends Object
Extracts low level audio features based on frequency domain values. EXPERIMENTAL!!- Author:
- Hendrik Schreiber
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
SpectralFeaturePipelines.MagnitudesSumFunction<T extends AudioBuffer>
Aggregate function that joins multiple buffers by adding up their magnitudes, index by index.
-
Field Summary
Fields Modifier and Type Field Description static AggregateFunction<AudioBuffer,Float>
MAGNITUDE_STANDARD_DEVIATION
Standard deviation of magnitudes, a.k.a.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static SignalPipeline<AudioBuffer,Float>
createAverageRelativeSpectralEntropyPipeline(String id, int windowSize, int hopsize, int maxFramesToProcess)
Creates a pipeline that converts the signal to mono, applies the given window (recommended is a window length of 65536 and a hopsize of 32768 for audio with a 44.1kHz sample rate), performs a FFT (Hamming window), maps the resulting spectrum into the cents scale, wraps the result into a single octave, smooths it and then calculates the relative entropy for every window.static SignalPipeline<AudioBuffer,Float>
createAverageSpectralBrightnessPipeline(String id, int windowSize, int hopsize, float cutOffFrequency, int maxFramesToProcess)
static SignalPipeline<AudioBuffer,Float>
createAverageSpectralCentroidPipeline(String id, int windowSize, int hopsize, int maxFramesToProcess)
Average of the spectral centroids computed for individual windows of the given length and hopsizestatic SignalPipeline<AudioBuffer,Float>
createAverageSpectralFlatnessPipeline(String id, int windowSize, int hopsize, int maxFramesToProcess)
Average spectral flatness overmaxFramesToProcess
frames similar to MPEG-7 ASF.static SignalPipeline<AudioBuffer,LinearFrequencySpectrum>
createAverageSpectralFluctuationPipeline(String id, int maxFramesToProcess)
Averages the per-window-sums ofcreateFrameSummarizedSpectralFluctuationPipeline(String, int)
into a singleLinearFrequencySpectrum
.static SignalPipeline<AudioBuffer,Float>
createAverageSpectralFluxPipeline(String id, int windowSize, int hopsize, int maxFramesToProcess)
static SignalPipeline<AudioBuffer,Float>
createAverageSpectralRollOffPipeline(String id, int windowSize, int hopsize, float threshold, int maxFramesToProcess)
static SignalPipeline<AudioBuffer,Float>
createAverageSpectralSpreadPipeline(String id, int windowSize, int hopsize, int maxFramesToProcess)
static SignalPipeline<AudioBuffer,Float>
createAverageSpectralVariabilityPipeline(String id, int windowSize, int hopsize, int maxFramesToProcess)
static SignalPipeline<AudioBuffer,LinearFrequencySpectrum>
createFrameSummarizedSpectralFluctuationPipeline(String id, int maxFramesToProcess)
Creates a pipeline that converts the signal to mono, applies a 1024 samples window with 512 hopsize, then applies a Hamming window followed by a FFT.static SignalPipeline<AudioBuffer,AudioBuffer>
createOnsetStrengthPipeline(int startTime, int duration)
static SignalProcessor[]
createOnsetStrengthProcessors(int startTime, int duration, int lowFrequency, int highFrequency, float onsetFactor, SignalProcessor... tail)
Creates an array of processors that can be used to form aSignalPipeline
that will deliverOnsetStrength
values.static SignalProcessor[]
createOnsetStrengthProcessors(int startTime, int duration, SignalProcessor... tail)
Creates an array of processors that can be used to form aSignalPipeline
that will deliverOnsetStrength
values.static SignalPipeline<AudioBuffer,Float>
createSpectralFluctuationPeakPipeline(String id, int maxFramesToProcess)
Determines the peak ofcreateAverageSpectralFluctuationPipeline(String, int)
.static SignalPipeline<AudioBuffer,Float>
createStandardDeviationSpectralVariabilityPipeline(String id, int windowSize, int hopsize, int maxFramesToProcess)
static SignalPipeline<AudioBuffer,LinearFrequencySpectrum>
createSummarizedSpectralFluctuationPipeline(String id, int maxFramesToProcess)
Sums the per-window-sums ofcreateFrameSummarizedSpectralFluctuationPipeline(String, int)
into a singleLinearFrequencySpectrum
.
-
-
-
Field Detail
-
MAGNITUDE_STANDARD_DEVIATION
public static AggregateFunction<AudioBuffer,Float> MAGNITUDE_STANDARD_DEVIATION
Standard deviation of magnitudes, a.k.a. variability or flatness.
-
-
Method Detail
-
createOnsetStrengthPipeline
public static SignalPipeline<AudioBuffer,AudioBuffer> createOnsetStrengthPipeline(int startTime, int duration)
-
createOnsetStrengthProcessors
public static SignalProcessor[] createOnsetStrengthProcessors(int startTime, int duration, SignalProcessor... tail)
Creates an array of processors that can be used to form aSignalPipeline
that will deliverOnsetStrength
values. The signal is decimated to 11025 Hz, Hamming windowed 1024/512, Fourier transformed, bandpass filtered 30-720 Hz and then the onset strength values are determined.- Parameters:
startTime
- start time in secondsduration
- duration in secondstail
- signal processors to append to the produced processor array- Returns:
- array of processors
-
createOnsetStrengthProcessors
public static SignalProcessor[] createOnsetStrengthProcessors(int startTime, int duration, int lowFrequency, int highFrequency, float onsetFactor, SignalProcessor... tail)
Creates an array of processors that can be used to form aSignalPipeline
that will deliverOnsetStrength
values. The signal is decimated to 11025 Hz, Hamming windowed 1024/512, Fourier transformed, and then the onset strength values are determined.- Parameters:
startTime
- start time in secondsduration
- duration in secondslowFrequency
- lower boundary for the bandpass in Hz (e.g.30Hz
)highFrequency
- upper boundary for the bandpass in Hz (e.g.720Hz
)onsetFactor
- the factor by which a power value has to be greater than the power value for the previous frame (e.g.1.76f
)tail
- signal processors to append to the produced processor array- Returns:
- array of processors
- See Also:
OnsetStrength
-
createAverageSpectralFlatnessPipeline
public static SignalPipeline<AudioBuffer,Float> createAverageSpectralFlatnessPipeline(String id, int windowSize, int hopsize, int maxFramesToProcess)
Average spectral flatness overmaxFramesToProcess
frames similar to MPEG-7 ASF. The default frequency range is 250Hz - 16kHz (n=-8, bands=24). Window size may be equals to hopsize. MPEG-7 recommends a window length of 30ms (for a 44.1kHz sample rate this means a window size of 1323 samples).- Parameters:
id
- result idwindowSize
- window sizehopsize
- hopsizemaxFramesToProcess
- max frames to process- Returns:
- pipeline
- See Also:
- Spectral Flatness on Wikipedia,
AudioSpectrumFunctions.createSpectralFlatnessFunction(int, int)
-
createAverageSpectralCentroidPipeline
public static SignalPipeline<AudioBuffer,Float> createAverageSpectralCentroidPipeline(String id, int windowSize, int hopsize, int maxFramesToProcess)
Average of the spectral centroids computed for individual windows of the given length and hopsize- Parameters:
id
- result idwindowSize
- window sizehopsize
- hopsizemaxFramesToProcess
- max frames to process- Returns:
- pipeline
-
createAverageSpectralSpreadPipeline
public static SignalPipeline<AudioBuffer,Float> createAverageSpectralSpreadPipeline(String id, int windowSize, int hopsize, int maxFramesToProcess)
-
createAverageSpectralFluxPipeline
public static SignalPipeline<AudioBuffer,Float> createAverageSpectralFluxPipeline(String id, int windowSize, int hopsize, int maxFramesToProcess)
-
createAverageSpectralVariabilityPipeline
public static SignalPipeline<AudioBuffer,Float> createAverageSpectralVariabilityPipeline(String id, int windowSize, int hopsize, int maxFramesToProcess)
-
createStandardDeviationSpectralVariabilityPipeline
public static SignalPipeline<AudioBuffer,Float> createStandardDeviationSpectralVariabilityPipeline(String id, int windowSize, int hopsize, int maxFramesToProcess)
-
createAverageSpectralRollOffPipeline
public static SignalPipeline<AudioBuffer,Float> createAverageSpectralRollOffPipeline(String id, int windowSize, int hopsize, float threshold, int maxFramesToProcess)
- Parameters:
id
- id to collect the resultwindowSize
- window sizehopsize
- hopsizethreshold
- threshold in percent, typically 0.85 or 0.95maxFramesToProcess
- max number of frames to process, the following frames are ignored- Returns:
- average roll off frequency
- See Also:
AudioSpectrumFunctions.createRollOffFunction(float)
-
createAverageSpectralBrightnessPipeline
public static SignalPipeline<AudioBuffer,Float> createAverageSpectralBrightnessPipeline(String id, int windowSize, int hopsize, float cutOffFrequency, int maxFramesToProcess)
- Parameters:
id
- id to collect the resultwindowSize
- window sizehopsize
- hopsizecutOffFrequency
- cut off frequency in HzmaxFramesToProcess
- max number of frames to process, the following frames are ignored- Returns:
- average spectral brightness (brighness values are computed for each frame and then those values are averaged)
- See Also:
AudioSpectrumFunctions.createBrightnessFunction(float)
-
createAverageRelativeSpectralEntropyPipeline
public static SignalPipeline<AudioBuffer,Float> createAverageRelativeSpectralEntropyPipeline(String id, int windowSize, int hopsize, int maxFramesToProcess)
Creates a pipeline that converts the signal to mono, applies the given window (recommended is a window length of 65536 and a hopsize of 32768 for audio with a 44.1kHz sample rate), performs a FFT (Hamming window), maps the resulting spectrum into the cents scale, wraps the result into a single octave, smooths it and then calculates the relative entropy for every window. These entropy values are then averaged.
This is similar (but not identical) to MIRToolBox
mirentropy(mirspectrum(x,'Collapsed','Min',40,'Smooth',70,'Frame',1.5,.5))
. The main difference lies in the length of the FFT. The MIRToolbox version ensures that the FFT delivers a bandwidth slightly smaller than 1 cent of the min bandwidth (40Hz), which leads to excessively large FFTs (2,097,152 samples for a sample rate of 44.1kHz).This version does not ensure a small enough bandwidth, but simply uses the given window length. This (depending on the window length) leads to wrong results in the lower cent bins, but in the end does not affect the overall relative entropy too much. Also the cent spectrum itself is computed differently - this version does not try to interpolate at all.
- Parameters:
id
- idwindowSize
- window size, 65536 recommended for 44.1kHz sample rate (roughly 1.5s)hopsize
- hopsize, half the window size is recommendedmaxFramesToProcess
- max frames to process- Returns:
- averaged relative entropy
-
createSummarizedSpectralFluctuationPipeline
public static SignalPipeline<AudioBuffer,LinearFrequencySpectrum> createSummarizedSpectralFluctuationPipeline(String id, int maxFramesToProcess)
Sums the per-window-sums of
createFrameSummarizedSpectralFluctuationPipeline(String, int)
into a singleLinearFrequencySpectrum
.This approach is very similar to the
mirfluctuation(audio, 'Summary')
from MIRToolbox.Note that the absolute values depend on the length of the provided audio and the parameter maxFramesToProcess.
- Parameters:
id
- idmaxFramesToProcess
- max audio frames to process- Returns:
- a
LinearFrequencySpectrum
that contains the mentioned sums - See Also:
OuterEarModel.Terhardt
,AuditoryMasking.SCHROEDER_MAGNITUDE_MASK
,BandSplit
,MultiBand
,createFrameSummarizedSpectralFluctuationPipeline(String, int)
-
createAverageSpectralFluctuationPipeline
public static SignalPipeline<AudioBuffer,LinearFrequencySpectrum> createAverageSpectralFluctuationPipeline(String id, int maxFramesToProcess)
Averages the per-window-sums of
createFrameSummarizedSpectralFluctuationPipeline(String, int)
into a singleLinearFrequencySpectrum
. The result is the average fluctuation over all bark bands and over all frames/windows.- Parameters:
id
- idmaxFramesToProcess
- max audio frames to process- Returns:
- a
LinearFrequencySpectrum
that contains the mentioned sums - See Also:
OuterEarModel.Terhardt
,AuditoryMasking.SCHROEDER_MAGNITUDE_MASK
,BandSplit
,MultiBand
,createFrameSummarizedSpectralFluctuationPipeline(String, int)
-
createSpectralFluctuationPeakPipeline
public static SignalPipeline<AudioBuffer,Float> createSpectralFluctuationPeakPipeline(String id, int maxFramesToProcess)
Determines the peak ofcreateAverageSpectralFluctuationPipeline(String, int)
.- Parameters:
id
- idmaxFramesToProcess
- max frames to process- Returns:
- pipeline
-
createFrameSummarizedSpectralFluctuationPipeline
public static SignalPipeline<AudioBuffer,LinearFrequencySpectrum> createFrameSummarizedSpectralFluctuationPipeline(String id, int maxFramesToProcess)
Creates a pipeline that converts the signal to mono, applies a 1024 samples window with 512 hopsize, then applies a Hamming window followed by a FFT. Then: Terhardt outer ear model, grouping into bark bands, masking using the Schroeder et al spreading function, conversion to dB and FFT alongside the bark bands. The resulting spectra for all bands in one window are summed.
- Parameters:
id
- idmaxFramesToProcess
- max audio frames to process- Returns:
- a
LinearFrequencySpectrum
that contains the mentioned sums - See Also:
OuterEarModel.Terhardt
,AuditoryMasking.SCHROEDER_MAGNITUDE_MASK
,BandSplit
,MultiBand
-
-