An alternative way of using TreeMix is not with original populations, but with allele frequencies derived from ADMIXTURE components. ADMIXTURE outputs a P file of such frequencies, which can be easily converted into the desired counts. See technical note at the bottom of the post
Application to K12b components
I have applied this idea to the K12b components. Here is the tree with no migration edges, using the Sub_Saharan component as an outgroup:
West Eurasian components that I've labeled "the Six" group together, but the Northwest African one is intermediate between the others and the two African components.
Now, let's allow one migration edge:
Now, the Northwest African component seems derived from what could be called "Indigenous Northwest Africans" but there is a migration edge going to it from a southern Caucasoid population.
Let's allow two migration edges:
Now, there appears to be some gene flow from what appears to be an early Proto-Eurasian population into Southwest Asians. This may be consistent with my idea that the Southwest Asian components represents an amalgam of Neolithic migrants from the "core area" with pre-existing inhabitants of the southern Near East.
With three migration edges:
There now appears to be some gene flow from East Africa to southern Caucasoids. One might speculate that this has something to do with the dispersal of Y-haplogroup E1b1b and/or mtDNA haplogroup M1?
With four migration edges:
There now appears gene flow from the Atlantic_Med into the North_European component. Does that indicate the absorption by the ancestors of the North Europeans of an Oetzi-like substratum?
With five migration edges:
There now appears some input into the Siberian component by a Proto-North European one. This may be related to steppe-related dispersals of northern Caucasoids in Siberia during the Eneolithic and later times, and/or Proto-Europoids like Kostenki and its eastern relatives?
Notice also, that it appears that the North_European input into Siberian precedes the Atlantic_Med input into North_European. So, this is consistent with an eastern origin of North_European which absorbed Atlantic_Med/Oetzi-like populations in Europe and contributed to the East Eurasian native population in Siberia.
Strength of the Edges
The strength of the edges (for the -m 5 run) is:
Notice that these are inferred contributions between ADMIXTURE components. Extant populations are composed of different proportions of these components.
Technical Note
Here is how to convert ADMIXTURE output into TreeMix allele counts. This relies on the P and Q files output by ADMIXTURE software.
The P file is an MxK matrix, where:
M: number of SNPS
K: number of components
Suppose one entry in this array is, say, 0.6.
This is consistent with a 6,4 entry in the corresponding TreeMix input file, because the first allele has a frequency of 6/(6+4) = 0.6, and the other allele has a frequency of 0.4 = 4/(4+6).
But, it's also consistent with e.g., 12,8 or 24,16, etc.
You can figure out how many alleles in total to use, by exploiting the Q file, which is an IxK matrix, where
I: number of individuals
K: number of components
If you sum up one of the columns in this array, you get a number of equivalent "individuals" that each ADMIXTURE component corresponds to, based on the original ADMIXTURE run.
The entries in the TreeMix input file are then like this, for the k-th population and m-th SNP
2 * sum(Q[,k]) * P[m, k] , 2 * sum(Q[,k]) * (1-P[m, k])
You might want to round up these numbers, as they may not be exactly integer.
Application to K12b components
I have applied this idea to the K12b components. Here is the tree with no migration edges, using the Sub_Saharan component as an outgroup:
West Eurasian components that I've labeled "the Six" group together, but the Northwest African one is intermediate between the others and the two African components.
Now, let's allow one migration edge:
Now, the Northwest African component seems derived from what could be called "Indigenous Northwest Africans" but there is a migration edge going to it from a southern Caucasoid population.
Let's allow two migration edges:
Now, there appears to be some gene flow from what appears to be an early Proto-Eurasian population into Southwest Asians. This may be consistent with my idea that the Southwest Asian components represents an amalgam of Neolithic migrants from the "core area" with pre-existing inhabitants of the southern Near East.
With three migration edges:
There now appears to be some gene flow from East Africa to southern Caucasoids. One might speculate that this has something to do with the dispersal of Y-haplogroup E1b1b and/or mtDNA haplogroup M1?
With four migration edges:
There now appears gene flow from the Atlantic_Med into the North_European component. Does that indicate the absorption by the ancestors of the North Europeans of an Oetzi-like substratum?
With five migration edges:
There now appears some input into the Siberian component by a Proto-North European one. This may be related to steppe-related dispersals of northern Caucasoids in Siberia during the Eneolithic and later times, and/or Proto-Europoids like Kostenki and its eastern relatives?
Notice also, that it appears that the North_European input into Siberian precedes the Atlantic_Med input into North_European. So, this is consistent with an eastern origin of North_European which absorbed Atlantic_Med/Oetzi-like populations in Europe and contributed to the East Eurasian native population in Siberia.
Strength of the Edges
The strength of the edges (for the -m 5 run) is:
- 73% of Southwest_Asian -> Northwest_African
- 7% of East_African -> Southwest Asian/Caucasus/Atlantic_Med group
- 42% of Atlantic_Med -> North_European
- 18% Proto-Eurasian -> Southwest Asian
- 19% of North_European -> Siberian
Notice that these are inferred contributions between ADMIXTURE components. Extant populations are composed of different proportions of these components.
Technical Note
Here is how to convert ADMIXTURE output into TreeMix allele counts. This relies on the P and Q files output by ADMIXTURE software.
The P file is an MxK matrix, where:
M: number of SNPS
K: number of components
Suppose one entry in this array is, say, 0.6.
This is consistent with a 6,4 entry in the corresponding TreeMix input file, because the first allele has a frequency of 6/(6+4) = 0.6, and the other allele has a frequency of 0.4 = 4/(4+6).
But, it's also consistent with e.g., 12,8 or 24,16, etc.
You can figure out how many alleles in total to use, by exploiting the Q file, which is an IxK matrix, where
I: number of individuals
K: number of components
If you sum up one of the columns in this array, you get a number of equivalent "individuals" that each ADMIXTURE component corresponds to, based on the original ADMIXTURE run.
The entries in the TreeMix input file are then like this, for the k-th population and m-th SNP
2 * sum(Q[,k]) * P[m, k] , 2 * sum(Q[,k]) * (1-P[m, k])
You might want to round up these numbers, as they may not be exactly integer.