A javascript function to validate FASTA sequences

I was more than a bit annoyed of not finding this out there in the interwebs, being a strong encourager of googling (or in Jamie’s case duck-duck-going) and re-use.

So I proffer my very own fasta validation javascript function.

/*
 * Validates (true/false) a single fasta sequence string
 * param   fasta    the string containing a putative single fasta sequence
 * returns boolean  true if string contains single fasta sequence, false 
 *                  otherwise 
 */
function validateFasta(fasta) {

	if (!fasta) { // check there is something first of all
		return false;
	}

	// immediately remove trailing spaces
	fasta = fasta.trim();

	// split on newlines... 
	var lines = fasta.split('\n');

	// check for header
	if (fasta[0] == '>') {
		// remove one line, starting at the first position
		lines.splice(0, 1);
	}

	// join the array back into a single string without newlines and 
	// trailing or leading spaces
	fasta = lines.join('').trim();

	if (!fasta) { // is it empty whatever we collected ? re-check not efficient 
		return false;
	}

	// note that the empty string is caught above
	// allow for Selenocysteine (U)
	return /^[ACDEFGHIKLMNPQRSTUVWY\s]+$/i.test(fasta);
}

Let me know, by comments below, if you spot a no-no.  Please link to this post if you find use for it.

p.s. I already noticed that this only validates one sequence.  This is because this function is taken out of one of our web servers, Memoir, which specifically only requires one sequence.  If there is interested for multi sequence validation I will add it.

 

2 thoughts on “A javascript function to validate FASTA sequences

  1. Anand

    Hi Ebejer,
    Thanks for this piece of code.

    I am looking for javascript to validate a text box entry to check if all submitted sequences are Protein sequences.

    Would it be possible to extend your code to check for multiple sequences from a text box and return true only if it is a protein sequence.

    Thanks,
    Anand

Leave a Reply