Gene Finding via TATA box search

Problem

  • Within a long region of genomic sequence, genes are also characterised by having a the sequence “TATA” somewhere near the beginning of the string.
  • Write a program to prompt the user for a string of DNA bases (ACTG)
  • Search the string for the substring “TATA”
  • If found, report the 0-based position of the first match. Otherwise report not found.

A contextually-related problem is Gene Finding via GC content.

Solution

/*
Test Case 1:
Input: TATA (example of string that is exactly what we're looking for)
Expected Output: 0
Actual Output: 0 (was 1 before adjusted i in cout statement)
 
Test Case 2:
Input: GATATA (example of string that contains what we're looking for)
Expected Output: 2
Actual Output: 2 (was 3 before adjusted i in cout statement)
 
Test Case 3:
Input: A (example of string that doesn't contain subsequence TATA)
Expected Output: Not Found
Actual Output: Not Found (was 2 before adding if around last cout statement)
 
Other example Test Case inputs:  GTATAG (TATA is in the middle), TAT (Almost TATA),TATAGCTATA (TATA appears twice) etc.
*/
 
#include <iostream>
#include <string>
 
using namespace std;
 
int main()
{
	// Inputs: DNA sequence
	// Outputs: First 0-based position of "TATA" found in input or "Not Found"
 
	// Define subsequence to find
	string subseq_to_find = "TATA";
 
	// Prompt user for sequence
	cout << "DNA seq please: ";
	string dna_seq;
	cin >> dna_seq;
 
	// Use a variable that will keep track whether or not we have found TATA yet
	bool found = false;
 
	int i = 0;
 
	// As long as we haven't found TATA and as long as we haven't checked every position in the input sequence
	while (!found && i < dna_seq.length())
	{
		// check if a substring starting the current position is the same as the subsequence we're looking for
		if (subseq_to_find == dna_seq.substr(i, subseq_to_find.length()))
		{
			// if it is, then we say we've found it
			found = true; // how exciting!
		}
		// be sure to increment the current position for the next time through the loop.
		i++;
	}
 
	if (found)
	{
		cout << subseq_to_find << " was found at position " << i - 1 << endl;
	}
	else
	{
		cout << "Not found" << endl;
	}
 
	system("pause");
	return 0;
}
cs-142/gene-finding-via-tata-box-search.txt · Last modified: 2015/05/12 18:40 by cs142ta
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0