Home Page > > Details

Ghostwriter GNBF5010 Assignment,Help With scripts Assignment,Ghostwriter Python Programming Assignment,Help With Python AssignmentHelp With Prolog|Ghostwriter R Programming

GNBF5010 Homework 2
Please zip all your files for Homework 2, including the scripts, input files and output files if any, into a
single file called YourLastname_Firstname_HW2.zip (or .rar). Then submit it to the Blackboard
on or before Wednesday, 23 October 2019.
NOTE 1: You will need to add necessary comments in your program to explain your code. Examples of
commenting can be found in the textbook.
NOTE2: Test your program with various test cases to ensure that it works properly.
1. Unknown Letters
Write a program to list which letters in the file seqs.txt are not A, T, C, or G. It should only list
each letter once. Hint: Start with an empty list for unknown letters. Then use two loops to scan
letters in each sequences.
2. Sequence Properties
Write a program, 1) read all sequences in seqs.txt and store them into a list called seqs, 2)
prompt the user a menu for selection of various properties of the seuqences, and 3) show the
corresponding results based on user’s choice. The menu for selection should include:
1) Number of sequences in the input file
2) Number of occurrences of a specific sequence, e.g. GGATC (The program will prompt
another message to the user for the target sequence.)
3) Number of sequences that are longer than a particular length, e.g. 1000 bases (The
program will ask the user again for the minimum length.)
4) Number of sequences with GC content higher than a given value, e.g. 50% (The GC
content could be calculated as (num_of_G + num_of_C) / seq_total_len )
5) The combination of choices 3 and 4: Number of sequences longer than a particular
length and with GC content over a particular value
In your program, there should be separate functions for the analysis in options 1 to 4. Your
program should work like this:
Please select the sequences property that you want to display, or press 0 to
exit the program.
1) Total number of sequences
2) Number of pattern occurrences
3) Number of sequences with length >= min_len
4) Number of sequences with GC% >= min_GC
5) Number of sequences with length >= min_len and GC% >= min_GC
Enter the choice: 4
Enter the minimum GC content (min_GC): 50
Calculating …
There are 36 sequences with GC% >= 50%.
==
Please select the sequences property that you want to display, or press 0 to
exit the program.
GNBF5010 Homework 2
1) Total number of sequences
2) Number of pattern occurrences
3) Number of sequences with length >= min_len
4) Number of sequences with GC% >= min_GC
5) Number of sequences with length >= min_len and GC% >= min_GC
Enter the choice: 5
Enter the minimum length (min_len): 1000
Enter the minimum GC content (min_GC): 40
Calculating …
There are 10 sequences with length >= 1000 bases and GC% >= 40%.
==
Please select the sequences property that you want to display, or press 0 to
exit the program.
1) Total number of sequences
2) Number of pattern occurrences
3) Number of sequences with length >= min_len
4) Number of sequences with GC% >= min_GC
5) Number of sequences with length >= min_len and GC% >= min_GC
Enter the choice: 0
Exiting the program …
3. Unique Words
Write a program that displays a list of all the unique words found in the file uniq_words.txt.
Print your results in alphabetic order and lowercase. Hint: Store words as the elements of a set;
remove punctuations by using the string.punctuation from the string module.
4. Molecular Weight
a) Make a python dictionary of one-letter amino acids codes (the keys) to their molecular
weight (the values), for all 22 amino acids. The molecular weight of 22 amino acids can be
found in the table of next page. As an example, the molecular weight of C (Cysteine) is 121.
b) Print out a list of all the amino acids sorted by their molecular weights from the heaviest to
the lightest. Hint: You may need to sort the items of the dictionary in question (a) based on
the values; example output:
AA MW
W 204Da
Y 181Da
R 174Da
F 165Da
… …
c) Read the protein sequence from lysozyme.fasta and calculate the molecular weight of
this protein using the dictionary created in question (a).
GNBF5010 Homework 2
5. Palindromic sequence
A palindromic sequence is a nucleic acid sequence in a double-stranded DNA or RNA molecule
wherein reading in a certain direction (e.g. 5' to 3') on one strand matches the sequence reading
in the same direction (e.g. 5' to 3') on the complementary strand. Here is an example:
, where on both strands, reading from 5’ to 3’ leads to the same sequence: GAATTC. The DNA
sequence GAATTC is thus said to be palindromic. For more details about the function of
palindromic sequences, see here. Now, write a program that reads DNA sequences from the file
palin_seq.txt and uses recursion to determine whether each of them is a palindromic
sequence. Print the results of your program in the following format.
1) ATCGAT --- YES
2) GAATTC --- YES
3) ATCGGCTA --- NO

Hint: Use string slicing to refer to and compare the characters on either end of the sequence string.

Contact Us - Email:99515681@qq.com    WeChat:codinghelp
© 2014 www.asgnhelp.com
Programming Assignment Help!