Friday, November 22, 2013
Almost a year ago, amidst great marketing fanfare and expectations from customers, 23andMe rolled out the Ancestry Composition (AC). This wasn't just supposed to be an updated Ancestry Painting (the company's previous main ancestry tool), but a state-of-the-art global and local ancestry deconvolution analysis that would put all competitors to shame.
Alas, not everything went according to plan.
For one, 23andMe's decision to represent Europe as ten ancestral regions, but not break up Sub-Saharan Africa or eastern Asia, was met with disbelief from many of the punters. Secondly, the AC suffered from an overfitting problem.
Overfitting happens when the individuals being tested are also used as reference samples, so they end up with inflated ancestry proportions based on their self-reported ancestry. For more info check out these threads at the 23andMe forums here and here. I was one of the overfitted customers, and it really pissed me off.
23andMe tried to fix the overfitting problem about six weeks after the launch, but that didn't go too well at all. See here.
Last week 23andMe announced that it would again attempt to fix the overfitting problem, and also break up Sub-Saharan Africa and eastern Asia into three and five regions, respectively. I'm not sure if the process has started yet, but you can get updates on how things are going here. You'll see these question marks next to the new reference sets in the AC until your updated ancestry proportions have been computed.
I was very excited about the AC when it was first announced, but now, after a year of waiting for the overfitting fix, I find the whole thing very underwhelming. I might, at some stage, write a review and user guide, but then again I might not.
Sunday, November 3, 2013
Exciting times are ahead for those of us with a passion for the genetic history of Poland. The newly formed Poznan Centre for Archaeogenomics (PCA) has just announced a major ancient DNA project to study the origins of the population of Greater Poland and Poland's earliest rulers, the Piast Dynasty.
Greater Poland, known as Wielkopolska in Polish, is located in Western Poland. This is where the Piast kingdom first emerged around 960AD, and then expanded to eventually become the Kingdom of Poland (see map below, from Wikipedia). So it's basically the cradle of the nation. Indeed, the present-day capital of Greater Poland, the city of Poznan, was the first capital of the Kingdom of Poland.
The main goals of the project are to test the level of genetic continuity in Greater Poland from the Iron Age to the Middle Ages (ie. either side of the so called Migration Period of the early Middle Ages), characterize the biogeographic origins of the Piast Dynasty, and compare the early Polish ruling elite to the peasants in terms of genetic structure.
The work will be carried out over a five year period, and as far as I can tell from the source linked to below, the aim is to fully sequence as many genomes as possible from the hundreds of ancient skeletons stored at Polish museums and universities. That sort of resolution should make it possible to easily meet the project goals. The press release doesn't say where the sequencing will be done, but a state-of-the-art ancient DNA lab was launched in Poznan about a year ago, so that looks like the most likely place (see here).
It's probably an understatement to say that the origin of the first Slavic tribes on Polish territory is a major sticking point among Polish archeologists and historians. There are two main competing theories: a local Polish origin (the autochthonous theory) and a Pripet Marsh origin (the allochthonous theory). The ethnogenesis of the Piast Dynasty is also something of a mystery, with some scholars suggesting they were originally of Danish Viking stock. It'll be nice to finally see these issues resolved once and for all.
Nauka w Polsce: Naukowcy chcą zbadać pochodzenie Wielkopolan
First direct evidence of genetic continuity in West and Central Poland from the Iron Age to the present
Polish "Goths" enjoyed their millet, while Polish "Vikings" did not
Monday, October 28, 2013
A paper at the AJHG describes a new cost effective method of significantly increasing the amount of authentic DNA output from ancient samples:
By using biotinylated RNA baits transcribed from genomic DNA libraries, we are able to capture DNA fragments from across the human genome. We demonstrate this method on libraries created from four Iron Age and Bronze Age human teeth from Bulgaria, as well as bone samples from seven Peruvian mummies and a Bronze Age hair sample from Denmark. Prior to capture, shotgun sequencing of these libraries yielded an average of 1.2% of reads mapping to the human genome (including duplicates). After capture, this fraction increased substantially, with up to 59% of reads mapped to human and enrichment ranging from 6- to 159-fold.
This is particularly good news for studies which aim to extract autosomal DNA from hundreds of ancient remains, like Gothenburg University's The Rise project, which I excitedly blogged about earlier this year (see here). In fact, I suspect the aforementioned Danish hair sample is one of the samples from The Rise dataset. The reason I say that is because Morten Allentoft is a co-author on this paper, and he's also doing the DNA analysis for The Rise (see here).
In any case, below are two global Principal Component Analyses (PCAs) featuring one of the ancient Bulgarians (V2) and the ancient Dane (M4). The principal components (PC1 & 2) were computed using only modern samples, and then the ancient samples projected onto the PCA space.
The ancient Bulgarian is sitting more or less where modern Bulgarians are usually found on such global plots. On the other hand, the ancient Dane is clearly shifted towards East Asia, and as a result clusters with Finns, which I suppose is somewhat unexpected because that never happens with modern Danes. So either there's a problem with the analysis, like, say, projection bias (see below for more details), or this Bronze Age Dane was in fact more eastern in terms of global genetic affinities than modern Danes. The latter might well be true if, for instance, he was a recent descendant of migrants from the east (like present-day Russia), and/or he harbored more Mesolithic hunter-gatherer ancestry than Danes do today.
Now, here are a couple of PCAs limited to European samples from the supplemental data PDF, including another ancient Bulgarian (K8) and the same ancient Dane (M4). Unfortunately, PC1 appears to be mostly a reflection of the well documented and very recent founder effect and strong genetic drift experienced by the Finnish population. In other words, it's not saying much more than the fact that the ancient samples weren't affected by the same demographic events and genetic drift as Finns during the past few hundred years. It might have been possible to get more informative results by reducing the Finnish sample to only a handful of the least drifted (ie. least Finnish-like) individuals.
Moreover, it's curious that both ancient samples land in more or less the middle of their respective plots in PC2, despite the fact that they come from very different parts of Europe. I suspect that in these instances projection bias is indeed the problem.
Projection bias is similar to the "calculator effect" (see here), but it affects PCAs, especially those that include only closely related populations, like from Europe. For more background see Haasl et al. 2012 and Lee et al. 2012.
It's also interesting to note that two of the Iron Age Bulgarians are reported as belonging to mtDNA haplogroups U3b and HV, respectively. Both of these haplogroups are generally accepted to be of Near Eastern origin. They're rare in Europe today (usually <2%), but relatively more common in Bulgaria than most other European countries. This suggests some genetic continuity in Bulgaria from at least the Iron Age to the present. Indeed, U3 has been reported from early Neolithic samples from Germany and Ukraine, which means that the ancient Bulgarian U3 lineage need not have arrived in Europe from the Near East during the metal ages.
Carpenter et al., Pulling out the 1%: Whole-Genome Capture for the Targeted Enrichment of Ancient DNA Sequencing Libraries, The American Journal of Human Genetics (2013), http://dx.doi.org/10.1016/j.ajhg.2013.10.002
Sunday, October 27, 2013
This paper by Di Cristofaro et al. might well be the turning point in modern population genetics, at least as far as Eurasia is concerned anyway. Not only do the authors give up on the standard but dodgy method of dating Y-chromosome expansion times with Y-STR diversities, but they also conclude that, contrary to popular belief, Afghanistan and surrounds cannot be the source of any major population expansions into other parts of Eurasia. So this study really goes against the grain to what we've seen from academia in recent years, and I have to say it's very refreshing to finally read a paper like this which doesn't make the dubious claim that Y-chromosome haplogroup R1a is native to India.
Below are a few quotes and figure 2 from the study, showing the spatial distribution of six Ancestry Components (AC) from the K=9 ADMIXTURE analysis. Note the presence of the North European-specific AC4 in Central Asia, but almost complete lack of the South Asian-specific AC7 in Europe.
Given the uncertainties associated with Y-STR mutation rates  together with the onset of recent estimations of the Time to Most Recent Common Ancestor (TMRCA) of the various branching events in SNP based Y phylogenies using ‘complete’ Y sequences [74–76], in prudence, we choose not to estimate expansion times based on Y-STR diversities.
Our autosomal and haploid data suggested that the Afghan Hindu Kush populations exhibit a blend of components from Europe, the Caucasus, Middle East, East and South Asia. This juxtaposition of autosomal and haploid markers could reflect important male and female influences contributing to the Afghan populations’ genetic make-up. Considering autosomal data, all ancestral components displayed a decreasing gradient of their frequencies when approaching Afghanistan. Finding the highest genetic frequencies in a region does not necessarily mean that this region was the original source: it has been shown that geographic distributions can result from various modalities besides natural selection such as geographic barriers, subsequent migrations, replacement, isolation, and the surfing effect . However, the fact that all the ancestral components reach a lower frequency when in Afghanistan supports the model of a convergence of migrations [87,88].
Although the modern Afghan population is made up of ethnically and linguistically diverse groups, the similarity of the underlying gene pool and its underlying gene flows from West and East Eurasia and from South Asia is consistent with prehistoric post-glacial expansions, such as an eastward migration of humans out of the Fertile Crescent in the early Neolithic period, and the arrival of northern steppe nomads speaking the Indo-Iranian variety of Indo-European languages. Taken together, these events led to the creation of a common genetic substratum that has been veneered with relatively recent cultural and linguistic differences.
Di Cristofaro J, Pennarun E, Mazie`res S, Myres NM, Lin AA, et al. (2013) Afghan Hindu Kush: Where Eurasian Sub-Continent Gene Flows Converge. PLoS ONE 8(10): e76748. doi:10.1371/journal.pone.0076748
Wednesday, October 16, 2013
A pre-print at arXiv argues that most Chinese paternal lineages can be grouped into three subclades within Y-chromosome haplogroup O3, and that these expanded rapidly during the East Asian Neolithic. Moreover, it includes a series of maps showing early migration routes of modern humans across Eurasia. These maps suggest that Y-chromosome haplogroups R1a and R1b broke away from R1 about 14.8K years ago somewhere in West Central Asia, and then non-Indo-European groups loaded with R1b migrated to the Atlantic fringe via a route north of the Black Sea. R1a is singled out as the Proto-Indo-European marker, which makes sense based on its latest phylogeny and elevated presence in various ancient samples (see here).
Haplogroup P diverged into Q and R at ~24.1 kya, slightly before the LGM. Most Q individuals in Han Chinese belong to the Q1a1-M120 clade, while R’s in Han Chinese are mostly R1a1-M17. The separation events of R1 and R2, and R1a and R1b are estimated here at 19.9 and 14.8 kya, respectively. R1b roamed till the Atlantic coast, forming some of the non-Indo-European groups (e.g. Basque)32.Yen et al., Y Chromosomes of 40% Chinese Are Descendants of Three Neolithic Super-grandfathers, arXiv:1310.3897v1 [q-bio.PE]
Sunday, September 29, 2013
The quotes below come from a press release about a thesis on the Unetice culture by Gothenburg University archeologist Dalia Pokutta. The thesis, titled "Population Dynamics, Diet and Migrations of the Únetice Culture in Poland", was defended successfully earlier this month, and if it's going to appear online it'll probably be here. By the way, ancient DNA results (including Y-DNA, mtDNA and autosomal DNA) from the Unetice samples analysed by Pokutta are on the way (see here).
'Over 3800 years ago, a young male, possibly born in Skåne, made a journey of over 900 kilometers south, to Wroclaw in Poland. He died violently in Wroclaw, killed by Úněticean farmers, possibly due to romance with two local females, who were murdered together with him. This 'Bronze Age love story', with no happy end today is the first case of Swedish-Polish contacts in history ever', concludes archaeologist Dalia Pokutta, author of the thesis. ... 'It is the biggest isotopic project undertaken in Poland so far. We analysed hundreds of samples, not only human bones, but also animals. This study deals with the humans of a long-forgotten past and figuratively speaking, it has been written by the hands of fifty dead people. This story leads us to the first Europe of metals and the beginnings of the Bronze Age world, but above all to past societies and their members. The results of the analyses went beyond our wildest dreams or expectations' says Dalia Pokutta. ... One of the leading conclusions is a very high level of territorial mobility of the prehistoric population in Silesia, with presence of immigrants from Germany, Czechia, Hungary and Sweden. The study also confirms massive changes in European agriculture around year 2000 B.C, like the introduction of manuring on large scale. 'My study aims at a new dimension of bioarchaeology, presenting the archaeological culture through the life histories of the people: skilled astronomers and star-gazers, talented metallurgists, farmers, explorers, merchants and barrow builders; the people who laid the foundations of the first Europe of metals and the Bronze Age world', says archaeologist Dalia Pokutta.Source: Proof of human migration from Sweden to Poland during the Early Bronze Age
Thursday, September 12, 2013
Update 13/09/2013: Actually, to run GPS users can input what seem to be Geno 2.0 autosomal ancestry proportions into the relevant fields on this page. Now, for various reasons I don't have a very high opinion of the Geno 2.0 autosomal test, so it's not something I'll ever pay for. However, I just quickly put together a K=9 test that roughly approximates the Geno 2.0 analysis. It's a bit noisy, but here's the GPS result it generated (using 10 reference populations), which is fairly accurate... Ad-Mix page > Eurogenes > Eurogenes K9b), but I can't guarantee it'll produce accurate results for everyone. My outcome might have been a bit of a fluke. I'd say the best thing to do is to wait until the site starts accepting raw genotype data from users (see here). ... This thing will apparently predict your biogeographical origins down to your "home village". That's unlikely to be relevant for most personal genomics customers, but it just so happens that I do come from a small Polish village, so I'll certainly be able to test the veracity of this claim. I tried uploading my 23andMe data just now, but it told me to come back later, so I guess it's not ready yet.
The search for a biogeographical method that utilizes biological information to predict one's place of origin has occupied scientists for millennia. Modern biogeographical algorithms achieve an accuracy of 700 km in Europe but are highly inaccurate elsewhere, particularly in Southeast Asia and Oceania. Here, we present the admixture-based Geographic Population Structure (GPS) method that accurately infers the biogeography of worldwide individuals down to their village of origin. GPS' accuracy is demonstrated on three datasets: worldwide populations, Southeast Asians and Oceanians, and Sardinians (Italy) using 40,000-130,000 GenoChip markers. GPS correctly placed 80% of worldwide individuals within their country of origin with an accuracy of 87% for Asians and Oceanians. Applied to over 200 Sardinians villagers of both sexes, GPS placed a quarter of them within their villages and most of the remaining within 50 km of their villages, allowing us to identify the demographic processes that shaped the Sardinian society. Finally, we We further demonstrate additional three applications of GPS in tracing the biogeographical origin of the Druze population and uncovering the European origins of North Americans. The accuracy and power of GPS underscore the promise of admixture-based methods to biogeography and has important ramifications for genetic ancestry testing, forensic and medical sciences, and genetic privacy.
Ahh, OK, so you have to be a Sardinian villager to get the most out of this tool. Well, I'm still looking forward to putting it through its paces. That link again: Geographic Population Structure (GPS) prediction.