Regular expression in Java (5 points)

Due date: Jan 23/25, 2019, Lab time

Where to submit

You can test and submit your assignment at our submission site The last submission will be used for evaluating your assignment. You are welcome to contact our GAs for helping and marking your assignment.

Here is a list of common errors that you may want to double check before submission:

  • print statement: our marking program does not allow it. you won't be able to see the print result on our server anyway. so remove print statements;
  • inner classes: the marking program changes the class names, but only to the outmost class name. If you have an inner class, there will be two different class names. There is no need to declare an inner class except confusing the marking program. So remove innner class.
  • Sometimes the output file is empty because you removed close() statement to save the space. But you can not save the space in this way .
  • Avoid extra-long line.

    Purpose

    Warm-up with Java programming. Get familiar with regular expression. Understand the wide application of regular expression.

    Assignment specification

    Your job is to count the number of identifiers in programs written in our Tiny language.

    You should pick out the identifiers from a text file, and write the output to a text file (named A1.output). Note that the output file should contain a line like "identifiers:5" . Here are the sample input and output files.The input will have multiple lines. Please note that in this sample program the following are not counted as identifiers:

    Here are the test cases for the assignment: case 1, case 2, case 3, case 4, case 5, case 6. (ID counts: 5 4 6 7 8 9).

    If you can not pass test case 6, and you use [^a-zA-Z] as delimiter in Scanner, you can solve the problem by specifying the encoding as UTF-8, i.e., use

     new Scanner(yourFile, "UTF-8")... 
    

    In this assignment you can suppose that there are no comments in the programs.

    In the output file you should only write "identifiers:" followed by the number of identifiers. If there are multiple occurrences of an identifier in the input, you should only count it once. Don't write anything else into the output file.

    You will write two different programs to do this:

    1. Program A11.java is not supposed to use regular expressions, not regex package, not the methods involvoing regular expression in String class or other classes. Your program can look at characters one by one, and write a loop to check whether they are quoted strings, identifiers, etc. `
    2. Program A12.java will use java.util.regex. Two useful links to start with are JavaDoc of regex and a  tutorial for Java regex

    Your programs should be able to run by typing:

      %javac A11.java 
      %java A11  A1.tiny
      %javac A12.java 
      %java A12 A1.tiny
    

    In this assignment, the output should be in a file called "A1.output". You should not use keyboard input. The input file name will be provided as the argument of the program, while the output file name is hard coded in your programs. i.e., your code regarding input and output can be like the following:

                ...  new BufferedReader(new FileReader(args[0]));
                ...  new BufferedWriter(new FileWriter("A1.output"));
    
    Your program should be tested on luna or bravo.

    Please don't write unnecessarily long programs. The sample solutions for A11 and A12 consist of approximately 300 words altogether by PHP function str_word_count(), which are not written deliberately for short length and can be compacted into smaller sizes easily. Hence one mark is given if your wordcount is smaller than 300.

    Marking Scheme

    This time we give you an extra mark if your code is shorter or equal to 72 according to our website (or 32 according to wc). I.e., the total mark could be 6+1=7.
     yourMark=0;
     if (A11.java, A12.java are not sent properly) return; 
     for (each of A11, A12) 
          if (it is compiled correctly)  yourMark+=0.2;  
     for (each of A11, A12){
              if (your java program reads A1.tiny && generates result file A1.output)
                         for (each of the 6 tests cases) 
                               if (it is correct)   yourMark+=0.3;
              if youCode.length() < average(length of A11 in the class) yourMark+=0.5;
    
      }            
      for (each day of your late submission)  yourMark=yourMark*0.8;
      One bonus mark for the shortest code among the class.
    

    What to submit

    You should submit A11.java and A12.java.