[bioinfo]STARでmappingしたデータをgatkでsnp callする際のtips
STARでmappingしてbam出力したものをgatkでMarkDuplicatesしようとすると、
[Tue Dec 17 20:53:31 JST 2024] Executing as igawa@igawa-XXXX on Linux 5.19.0-32-generic amd64; OpenJDK 64-Bit Server VM 22.0.1-internal-adhoc.conda.src; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.6.1.0
INFO 2024-12-17 20:53:31 MarkDuplicates Start of doWork freeMemory: 88456088; totalMemory: 117440512; maxMemory: 16642998272
INFO 2024-12-17 20:53:31 MarkDuplicates Reading input file and constructing read end information.
INFO 2024-12-17 20:53:31 MarkDuplicates Will retain up to 60300718 data points before spilling to disk.
[Tue Dec 17 20:53:31 JST 2024] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=897581056
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
java.lang.NullPointerException: Cannot invoke "htsjdk.samtools.SAMReadGroupRecord.getReadGroupId()" because the return value of "htsjdk.samtools.SAMRecord.getReadGroup()" is null
at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:558)
at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:270)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:281)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:37)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
at org.broadinstitute.hellbender.Main.main(Main.java:306)
などと表示され、一見実行したように見えて、エラー終了となる。これは、STARが出力するbamのヘッダーにreadgroup (=@RG)がないことが原因らしく、
samtools addreplacerg -r "@RG\tID:RG1\tSM:SampleName\tPL:Illumina\tLB:Library.fa" -o .<output.bam> <input.bam>
を実行すると解決する。