Examples in which using the EM algorithm for GMM itself is insufficient but a visual modelling approach appropriate can be found in [Ultsch et al., 2015]. In general, a GMM is explainable if the overlapping of Gaussians remains small. An good example for modeling of such a GMM in the case of natural data can be found in the ECDA presentation available on Research Gate in [Thrun/Ultsch, 2015].
In the example below the data is generated specifcally such that a the resulting GMM is statistitically signficant. The interactive approach of AdaptGauss uses shiny. Hence, I dont know how to illustrate these examples in Rmarkdown.
data=c(rnorm(3000,2,1),rnorm(3000,7,3),rnorm(3000,-2,0.5)) gmm=AdaptGauss::AdaptGauss(data, Means = c(-2, 2, 7), SDs = c(0.5, 1, 4), Weights = c(0.3333, 0.3333, 0.3333)) AdaptGauss::Chi2testMixtures(data, gmm$Means,gmm$SDs,gmm$Weights,PlotIt=T) AdaptGauss::QQplotGMM(data,gmm$Means,gmm$SDs,gmm$Weights)
Not every multimodal dataset should be modelled with GMMs. This is an example for a non-statistically significant model of a multimodal dataset.
data('LKWFahrzeitSeehafen2010') gmm=AdaptGauss::AdaptGauss(LKWFahrzeitSeehafen2010, Means = c(52.74, 385.38, 619.46, 162.08), SDs = c(38.22, 93.21, 57.72, 48.36), Weights = c(0.2434, 0.5589, 0.1484, 0.0749)) AdaptGauss::Chi2testMixtures(LKWFahrzeitSeehafen2010, gmm$Means,gmm$SDs,gmm$Weights,PlotIt=T) AdaptGauss::QQplotGMM(LKWFahrzeitSeehafen2010,gmm$Means,gmm$SDs,gmm$Weights)
Thrun, M. C., & Ultsch, A. : Models of Income Distributions for Knowledge Discovery, Proc. European Conference on Data Analysis (ECDA), DOI: 10.13140/RG.2.1.4463.0244, pp. 136-137, Colchester, 2015.
Ultsch, A., Thrun, M. C., Hansen-Goos, O., & Lotsch, J. : Identification of Molecular Fingerprints in Human Heat Pain Thresholds by Use of an Interactive Mixture Model R Toolbox (AdaptGauss), International journal of molecular sciences, Vol. 16(10), pp. 25897-25911, 2015.