As best as I can determine the VK545 sample from the Ship Street, Dublin, Ireland site was aligned and analyzed using the GRCh37/hg19 reference. YFull shows this sample as being R1b-A225 under R1b-A223. However, GRCh37/hg19 is not the best reference.
[ https://www.yfull.com/branch-info/R-Y3646/#t4-tab ]
Given the enhanced coverage YFull is seeing with current Y-DNA sequencing using the CP086569.2 T2T reference, I thought it would be interesting to realign the VK545 sample to it as well. I downloaded the VK545.final.bam file from ENA and realigned it to the CP086569.2 T2T reference using Samtools to convert the BAM file to FASTQ; then BWA-MEM to realign to the CP086569.2 reference; and then Samtools to remove duplicates. I then used Bcftools mpileup/call to generate a VCF. There were certain intermediate steps in the pipeline that have been left out for simplicity. I did a whole genome realignment and not just a chrY realignment.
[ https://www.ebi.ac.uk/ena/browser/view/ ... show=reads ]
Doing this I got a very interesting result. I had previously written a PHP script to help me analyze VCFs. It is geared towards kits that have upgraded their NGS results, not towards ancient Y-DNA, but it still works well enough for ancient Y-DNA. It looks for all known R1b-DF104 upstream variants in the sample VCF and confirms and then ignores them. It then looks for all R1b-DF104+ known variants and reports those as well as any unknown variants. Here is a table of the results:
TIER | POS | ID | REF | ALT | QUAL | FILTER | INFO |
1 | 13391744 | DF105 | G | A | 46.4146 | . | CLADE=R1b-DF105;DP=2 |
1 | 21254183 | DF109 | A | T | 8.99921 | . | CLADE=R1b-DF105;DP=1 |
2 | 13575770 | FGC65032 | C | T | 8.99921 | . | CLADE=R1b-FGC65031;DP=1 |
4 | 20100623 | MF165555 | C | T | 7.30814 | . | CLADE=R1b-FTB39547;DP=1 |
5 | 20588609 | BY18188 | C | T | 7.30814 | . | CLADE=R1b-BY18120;DP=1 |
5 | 20721167 | FT119260 | G | A | 8.99921 | . | CLADE=R1b-FT115566;DP=1 |
6 | 11711058 | FGC8438 | G | A | 5.75677 | . | CLADE=R1b-BY18320;DP=1 |
6 | 16714581 | A11307 | G | A | 8.99921 | . | CLADE=R1b-A11307;DP=1 |
7 | 10467844 | FTB91864 | G | A | 8.99921 | . | CLADE=R1b-BY48495;DP=1 |
7 | 20877246 | FTC5774 | G | A | 8.13869 | . | CLADE=R1b-FTC5557;DP=1 |
8 | 7304629 | BY18132 | G | A | 8.99921 | . | CLADE=R1b-BY18132;DP=1 |
8 | 7318089 | M9520 | C | T | 8.99921 | . | CLADE=R1b-B24;DP=1 |
8 | 8634006 | Y129700 | G | A | 8.99921 | . | CLADE=R1b-FT285980;DP=1 |
8 | 17468716 | FTB12149 | G | A | 8.99921 | . | CLADE=R1b-BY47745;DP=1 |
8 | 18252929 | BY20817 | G | A | 8.99921 | . | CLADE=R1b-FT82182;DP=1 |
8 | 20099847 | BY132284 | G | A | 8.99921 | . | CLADE=R1b-BY146806;DP=1 |
9 | 6894312 | PH432 | G | A | 3.22451 | . | CLADE=R1b-FT109536;DP=1 |
9 | 15071932 | BY106599 | G | A | 8.99921 | . | CLADE=R1b-BY98600;DP=1 |
10 | 3436741 | FTC32491 | C | T | 8.99921 | . | CLADE=R1b-FTA14514;DP=1 |
10 | 15154507 | FGC62848 | G | A | 8.13869 | . | CLADE=R1b-FGC62843;DP=1 |
10 | 15525920 | Y52254 | C | T | 3.22451 | . | CLADE=R1b-BY18352;DP=1 |
11 | 8341530 | BY73963 | G | A | 5.04598 | . | CLADE=R1b-BY65078;DP=1 |
12 | 16208403 | Y26014 | G | A | 8.99921 | . | CLADE=R1b-Y26014;DP=1 |
13 | 17421295 | FT207643 | C | T | 8.99921 | . | CLADE=R1b-BY137737;DP=1 |
13 | 27258802 | FT178070 | C | T | 8.99921 | . | CLADE=R1b-BY16967;DP=1 |
14 | 12603472 | FGC19840 | G | A | 6.51248 | . | CLADE=R1b-FGC19856;DP=1 |
100 | 13157239 | . | G | A | 78.4149 | . | QD=26;DP=3;VDB=0.470313;SGB=-0.511536;MQSBZ=0;FS=0;MQ0F=0;AC=1;AN=1;DP4=0,0,1,2;MQ=60 |
100 | 15092006 | . | A | G | 104.415 | . | QD=26;DP=4;VDB=0.0320192;SGB=-0.556411;MQSBZ=0;FS=0;MQ0F=0;AC=1;AN=1;DP4=0,0,1,3;MQ=60 |
100 | 16408342 | . | G | A | 74.4149 | . | QD=24;DP=3;VDB=0.0900131;SGB=-0.511536;MQSBZ=0;FS=0;MQ0F=0;AC=1;AN=1;DP4=0,0,2,1;MQ=60 |
100 | 18136756 | . | G | A | 77.4149 | . | QD=25;DP=3;VDB=0.71601;SGB=-0.511536;MQSBZ=0;FS=0;MQ0F=0;AC=1;AN=1;DP4=0,0,2,1;MQ=60 |
100 | 18189304 | . | C | T | 62.4147 | . | QD=20;DP=3;VDB=0.318564;SGB=-0.511536;FS=0;MQ0F=0;AC=1;AN=1;DP4=0,0,3,0;MQ=60 |
Since the read depth is mainly 1, it is, of course, somewhat problematic. But DF105 has 2 reads and DF109 has 1 read. The other calls under R1b-DF104 are probably noise or contamination given their low QUAL scores, which even DF109 has. I don't have a tool to be able to examine whether DF108 is negative or a no-call in the realigned BAM file. But the 5 variants at the bottom are relatively strong with read depths of 3 and 4, assuming they are not sequencing artifacts, which seems unlikely.
At this point in time, it is unlikely these 5 variants are upstream of R1b-DF104. So IF DF108 is negative in the VK545 sample, the VK545 sample MAY split the R1b-DF105 phylogenetic node and be in a parallel branch comprised of those 5 variants. Since we have not seen this in current testing, perhaps this is an extinct branch. If so, then this is very interesting.
On the other hand, if DF108 is a no-call and likely positive, then I would assume that the VK545 sample is providing a potentially new direct subclade under R1b-DF105, or possibly within one of the other direct subclades, depending on negative or no-call results for those variant positions. This is still quite interesting.
Regardless, it seems clear that the VK545 sample is NOT R1b-A225+ since there are no confirming reads for it. IDK if anyone else has or is doing realignment of ancient samples, but from my experiment it would appear to be a worthwhile effort. Further, if anyone has the IGV tool or something similar, I will be happy to provide a copy of the realigned BAM file for viewing. Please PM me if you are interested.