Quantcast
Channel: Maia Atlantis: Ancient World Blogs
Viewing all articles
Browse latest Browse all 136795

Clusters Galore analysis of West Eurasians

$
0
0
It's been a while since the last Clusters Galore analysis, so I've decided to use my recently assembled dataset and run such an analysis over the individuals who belonged to the Six main West Eurasian components.

Hence, at the beginning, I identified 945 individuals in my set who had more than 95% combined admixture proportions in the Six. Subsequently, I ran MDS on this set, keeping 50 dimensions.

One of the open issues in Clusters Galore analysis is how to choose how many MDS dimensions to retain. So far, I've applied a heuristic by choosing the number of MDS dimensions that maximizes the number of inferred clusters by MCLUST. However, when I actually inspect the MDS plots, it often turns out that meaningful information seems present at even higher number of MDS dimensions. As a result, I've decided to pick the number of dimensions in the following manner.

The main idea is that data points in uninformative MDS dimensions will appear as largely Gaussian noise. So, we can use a test of normality (I've chosen the Shapiro-Wilk test) to detect dimensions that appear not to be noise. Below is the p-value of this test for different MDS dimensions:
Up to 22 dimensions, there is a strong non-Gaussian signal (all p-values less than 0.001). Hence, I would use the first 22 dimensions in MCLUST analysis. With these dimensions, the number of inferred clusters was estimated as 35. So, this is something like a 6-fold increase in resolution over the Six components inferred by ADMIXTURE.

The cluster totals for the different populations can be seen in the spreadsheet.

Important Caveat: Some populations (e.g., Finnish_D, or Turkish_D) have a great number of individuals who do not meet the "95% in the Six" inclusion threshold. Hence, results are not representative for them, and simply indicate the cluster assignment of their subsets that do meet the threshold. You can check whether individuals have been removed from the original dataset by comparing sample sizes in the Clusters Galore spreadsheet with the K12a one.

Here are some observations on the 35 cluster. I will mention the modal population (or region) for each one:
  1. Ashkenazi
  2. Scandinavian
  3. French
  4. British Isles
  5. Armenian
  6. S Italian/Sicilian
  7. Kurd
  8. Greek
  9. Cypriot
  10. Balto-Slavic
  11. Hungarian
  12. Balkan
  13. Sephardic
  14. Spanish
  15. Iberian
  16. North Italian/Tuscan
  17. Morocco Jews (main)
  18. Saudis
  19. Georgian/Abkhazian
  20. Basque
  21. Bedouin
  22. Druze #1
  23. Druze #2
  24. Druze (main)
  25. Mozabite (main)
  26. Mozabite #1
  27. Orkney
  28. Sardinian
  29. Azerbaijan Jews
  30. Iran/Iraq Jews
  31. Lezgins
  32. Morocco Jews #1
  33. Samaritan
  34. Yemen Jews
  35. Abkhazian

Viewing all articles
Browse latest Browse all 136795

Trending Articles