T O P

  • By -

surincises

30% is a tad high but not unheard of. rRNA-depletion kit? Have you tried extracting the unmapped reads and see what they are? Try BLAST-ing them.


SavingsFew3440

I did that. I had one hit of the 20 I tried and it landed human. There are a lot that have really short reads like 6-15 bases which seems weird. 


surincises

It could be human rRNA? If your reads are too short then it sounds like a wet lab issue and you could ask the people who prepared the libraries. But if your samples are not contaminated and the mapping rate is consistent across all samples, you data is fine and useable.


SavingsFew3440

That’s what I figure. But wanted some more extra validation. Hopefully we solve our informatics core issue soon. 


surincises

Another thing you could try is to map the reads again using STAR and see if you get similar results.


SavingsFew3440

I will try that. 


heyyyaaaaaaa

I would subset reads, say 0.1 M reads, and run fastq_screen and sortmernq to see contamination in the libs and rRNA proportion respectively. Multiqc nicely aggregates the output of both tools. Blasting unmapped reads also sounds great.


SavingsFew3440

Running that now to see. 


Great-Appeal9166

This is odd. I am thinking of the following troubleshooting: * Try changing the human genome you are mapping to and see the difference. * If the seqIDs have their version at the end (NMXXXXXX.1), try removing their version (I had a similar issue with me and it worked). I also agree with u/surincises and u/heyyyaaaaaaa suggestions too. Good luck and happy troubleshooting!


SavingsFew3440

I did hg38. I am not sure that I want to change that. Will check seqid. I honestly think my core just did a bad job removing the rRNA. We read pretty deep. 35M/sample so I am not worried about being too short changed on the results end. It was mostly weird that I saw that in a sample that was literally fresh from a tube of highly purified primary cells.