Using Python
- The script is to open a given file. The user is to be askedwhat the name of the file is. The script will then open the filefor processing and when done, close that file.
- The script will produce an output file based on the name of theinput file. The output file will have the same name as the inputfile except that it will begin with \"Analysis-\". This file will beopened and closed by the script.
- The script is to process the file and calculate the followinginformation and output the information to the file in the orderpresented here.
- The script it to count the number of lines, the number ofwords, and the number of characters in the file and put thisinformation out into the output file with appropriate labels forthe reader to know what the numbers are. This information is to beechoed on the screen to the user.
- You may find it easier to determine the number of words if youremove the punctuation, digits, and other non letter charactersother than spaces before trying to count the words. Those items arenot considered to be part of a word. Keep that in mind whenreferencing words in following instructions.
- Count spaces, digits, punctuation and other non lettercharacters as characters though
- The script is produce a list of all unique words in the fileand the of times each word appears in the file. This list withfrequency counts is to be put in the output file in alphabeticalorder and one word/frequency pair to a line. The format should beword (frequency count). Be sure there is a space between the wordand the beginning parentheses. You will count words that appearonly once. Due to the possible length of this list, you are not toecho this list to the screen, only place it in the outputfile.
- The script is to produce a list of 2 word pairs found in thefile that appear more than once. If a 2 word pair appears onlyonce, it is not to be put into the output file. The format of theline in the output file should be the two word pair followed by thefrequency count in parentheses as seen in the previous iteminvolving unique words. This list is put out after the single wordlist. There is to be a heading to the list to let the user knowthat the information is changing and a blank line put in before theheading. This information is to be echoed on the screen to theuser.
- The last bit of information the script it to place into theoutput file is the total number of words, the average length of aword, the number of unique words, the average number of letters inthe unique words, and the number of word pairs that havefrequencies of 2 or more.  Properly label each item ofinformation output in this section as well as placing a blank linebefore the section and giving the section a heading. Thisinformation is to be echoed on the screen to the user.
- It is fully conceivable that the average number of letters in aword (length of a word) for the over all document is different thanthe average number of letters in a word for the unique word list.This is because a word such as \"the\" might appear multiple times inthe file. In the first calculation, each instance of the iscounted. In the second calculation, the word \"the\" is only counted1 time on the list.
- The script is to use solid programming practices like comments,self documenting variable names (also known as meaningful variablenames) and easy to read and neat code.
- You are to place a comment block at the very top of the scriptcontaining your name, the semester, the due date of the exam, andthe instructor's name each on separate lines.
The logic is built to examine the process incoming data forspecific items of information. This may need to be done in specificorder with multiple processing steps.
You are to run your script of this test data file. Screen shotyour interactions with the user for your submission document. Thenplace your analysis file, your python code file, and yoursubmission document into a single zip file.
Some advice would be since you have the test data file, you cando these calculations by hand and check them against your analysisfile to see if your program is working correctly.