After multiple round of experimenting, I found out an alternative way to run RNA-seq mapping with STAR on Stampede2 would be to use a whole node (16 cores) at the same time. The DDR RAM for a node on Stampede2 is 96 Gb,which may not be enough for handling multiple independent mapping jobs. We can use multiple threads for STAR…
Category: Bioinformatic analysis
Use Launcher to bundle multiple serial jobs on Stampede2
As I was exploring Stampede2 at TACC for more RNA-seq data analysis, I needed to map the RNA seq reads for 68 samples to the Daphnia genome. Because on Stampede2, even a single serial job with 1 cpu would take the entire resources of the entire node of 16 cpus. I decided to use the tool launcher that built in…
An easy way to run parallel jobs on Stampede2
A new allocation on the TACC Stampede2 supercomputer has been awarded to our lab. I put it into use right away. My understanding is that Stampede2 is best suited for highly parallel computing jobs, especially the way by which the service units are calculated takes into account the whole node (of 16 cores). So even if only 1 core/cpu is…
Estimating within population/species PI (nucleotide diversity)
Learned a new bioinformatics trick today while using VCFtools to analyze nucleotide diversity for the Amazon molly (Poecilia formosa), an intriguing asexually reproducing (gynogenesis) fish. To calculate withing species/population Pi from a big vcf file containing multiple populations, provide a list containing all the individuals’ names from one population in a file (e.g., LIST), and use the following command “vcftools…