copyright Steve J. Hodges

CS 19 Fall 2016

Assignment 2 (Word Counter)



Program Description

In this assignment, you will read an unspecified number of lines of text from STDIN. Each line will contain an unspecified number of words. You may assume that no line will be greater than 255 characters long. Input is terminated with EOF (ctrl-D). You may assume that, at most, 5000 unique words will be entered and that no word will be more than 15 characters long. (A word, here, is defined as a sequence of alphanumeric characters and punctuation delimeted by whitespace.) Your program should discard punctuation that is found at the end of a word, but preserve punctuation inside a word. (example: blue: is the same word as blue, but isn't is not the same as isnt) You do not need to preserve the case of the input words, but words need to match in a case-insensitive manner. (Suggestion: lowercase all of the words.)

When the input is completed, your program should print out the words, one per line in alphabetical order. Each word should be followed by an integer count of the number of times that word appeared in the text. (Use a single space to separate the word and the count.) (Your program should have no other output.)


You may use either C style strings (character arrays) or C++ style strings (string class) for this assignment —both are described in chapter 9 of our text. Please use whichever you are most familiar with. If you haven't used character arrays before, please use the string class for this assignment. If you use character arrays, the functions strcmp(), strcpy(), strtok() and strlen() will be helpful for this assignment. If you use C++ style strings, you may find the palindrome example in chapter 9 is helpful.


To store your words and counts, you may choose to use either arrays or vectors. I suggest using whichever you are most comfortable using. If you don't have a preference, then I suggest using vectors, as they will provide more "built-in" functionality for you. The easiest way to store the words is to use a string array/vector for the words and an integer array/vector to store the count of each word.

Before you add words, you should check to see if the new word is already in your array/vector of words. If you decide to use arrays, you should use an integer variable to keep track of how many words are stored in the array. (If you decide to use a vector, you could use a seperate variable or the built-in vector size functionality.) For sorting the words, you may use either of the sorting routines that we've discussed in class.

You may find it helpful to write a function that adds word to your array/vector or increments the count of words that are already in the array/vector. Parameters to your function might be 1. the word to add 2. the array/vector of your words 3. the array/vector of the count of each of your words 4. the number of words in your table. You may also find it helpful to write a function that looks up a word in your array/vector.

Suggested order for writing this program:

  1. input multiple lines of text - termiated with EOF
  2. break each line into seperate words
    (or read each word individually and skip step 1)
  3. add each word to your array/vector
  4. modify code so that you check to see if a word is already in the array/vector before you add it
  5. sort the words in the table before output

What to turn in

As usual, leave your .cpp file for this program ("wcount.cpp") in your home directory on pengo.


Sample Input

This may be the way the world ends not with
a bang
but with a temper tantrum.
Okay, a temporary government shutdown
which became almost inevitable
after Sunday's House vote to provide government

funding only on unacceptable conditions
wouldn't be the end of the world.
But a United States government default,
which will happen unless Congress raises
the debt ceiling soon,
might cause financial catastrophe.

Unfortunately, many Republicans either don't
this or don't care.

(matching) Sample Output

a 4
after 1
almost 1
bang 1
be 2
became 1
but 2
care 1
catastrophe 1
cause 1
ceiling 1
conditions 1
congress 1
debt 1
default 1
don't 2
either 1
end 1
ends 1
financial 1
funding 1
government 3
happen 1
house 1
inevitable 1
many 1
may 1
might 1
not 1
of 1
okay 1
on 1
only 1
or 1
provide 1
raises 1
republicans 1
shutdown 1
soon 1
states 1
sunday's 1
tantrum 1
temper 1
temporary 1
the 5
this 2
to 1
unacceptable 1
understand 1
unfortunately 1
united 1
unless 1
vote 1
way 1
which 2
will 1
with 2
world 2
wouldn't 1