Advanced C++ coding discussion

Advanced C++ coding discussion

The goal of this assignment is to give you practice in function and class templates and applying other concepts covered in the course.

You are to create a generic (class template) hash table. You will use this template to create a “Dictionary” of words. The words will be stored in a (library) class type that you will derive from string. You will use your hash table Dictionary to spell check a document and perform some analysis of the efficiency of your hash table.

Program Steps

  1. Derive a class from string, call Mystring. You will use this class to store words from the Dictionary file. The class should contain necessary constructors, a conversion operator to convert Mystrings to unsigned int, a tolower function to convert the letters in a word to lower case, and a removePunctuation function to strip off non-alphabetic characters from both ends of the word. This class should be tested and then placed in a library.
  2. Create a generic linked list. Do this by creating a Node and List class template. You will use this by instantiating a template class of type Mystring.
  3. Create a hash-table class template, called Myhash. You may use the hash function shown below for your hashing algorithm. To resolve “collisions” in your hash table, use “chaining” implemented with your generic linked list from the previous step. You will instantiate your hash table class template to create a template class of Mystring. The Myhash class template should have size, insert and find member functions and member functions necessary to produce the 3 statistics shown in the output below (percent of buckets used, average bucket size, and largest bucket size). The Myhash insert function should throw a DuplicateError exception when a duplicate Mystring is attempted to be inserted into the Myhash template class object (the dictionary).
  4. Create a DuplicateError class template, derived from the logic_error standard exception. This class template will be instantiated to create a template class of Mystring.
  5. Run the code found in the suggested main function below. Your output should identify duplicate words (type Mystring) that were attempted to insert into the Dictionary, the number of words in the Dictionary, the hash-table bucket statistics (shown in the sample output), and misspelled words (plus total).
// hash function adapted from: Thomas Wang https://gist.github.com/badboy/6267743
unsigned hash(unsigned key) {

int c2 = 0x27d4eb2d; // a prime or an odd constant
key = (key ^ 61) ^ (key >> 16);
key = key + (key << 3);
key = key ^ (key >> 4);
key = key * c2;
key = key ^ (key >> 15);
return key % buckets;
}

Note: if you use this function as a member function, make it a
const member function.

Submission Requirements

Submit all of your source files, header files, and your archived Mystring library file in a compressed (zipped) file, contained in one folder. The instructor will unzip your files, compile and execute your program. Also, make sure you identify your compiler and OS in a comment in your main function.


Suggested main function

int main()
{
Myhash<Mystring,1500> Dictionary(true); // throw if duplicate words
Mystring buffer;

const string DictionaryFileName = “c:/temp/words”;
const string DocumentFileName = “c:/temp/roosevelt_first_inaugural.txt”;

ifstream fin(DictionaryFileName.c_str());
if (!fin)
{
cerr << “Can’t find ” << DictionaryFileName << endl;
exit(-1);
}
while (getline(fin, buffer))
{
// remove r if present (this for Mac/Linux)
if (buffer[buffer.size()-1] == ‘r’)
buffer.resize(buffer.size() – 1);
buffer.tolower();
try
{
Dictionary.insert(buffer);
}
catch (const DuplicateError<Mystring>& error)
{
cout << error.what() << endl;
}
}

cout << “Number of words in the dictionary = ” << Dictionary.size() << endl;
cout << “Percent of hash table buckets used = ” << setprecision(2) << fixed << 100 * Dictionary.percentOfBucketsUsed() << ‘%’ << endl;
cout << “Average non-empty bucket size = ” << Dictionary.averageNonEmptyBucketSize() << endl;
cout << “Largest bucket size = ” << Dictionary.largestBucketSize() << endl;

fin.close();
fin.clear();

// Spellcheck
unsigned misspelledWords = 0;

fin.open(DocumentFileName.c_str());
if (!fin)
{
cerr << “Can’t find ” << DocumentFileName << endl;
exit(-1);
}
while (fin >> buffer)
{
buffer.tolower();
buffer.removePunctuation();
if (!buffer.size())
continue;
if (!Dictionary.find(buffer))
{
misspelledWords++;
cout << “Not found in the dictionary: ” << buffer << endl;
}
}
cout << “Total mispelled words = ” << misspelledWords << endl;
}


Sample Output

Your output should look “similar” to the following:

Duplicate Mystring: clone
Duplicate Mystring: duplicate
Duplicate Mystring: resting
Duplicate Mystring: triplet
Duplicate Mystring: triplet
Duplicate Mystring: twin
Duplicate Mystring: two
Number of words in the dictionary = 24048
Percent of hash table buckets used = 100.00% <- You might see something different here (like 57.60%)
Average non-empty bucket size = 16.03
Largest bucket size = 43
Not found in the dictionary: consecration
Not found in the dictionary: induction
Not found in the dictionary: presidency
Not found in the dictionary: candor
Not found in the dictionary: impels
Not found in the dictionary: preeminently
Not found in the dictionary: boldly
Not found in the dictionary: conditions
Not found in the dictionary: nameless
Not found in the dictionary: paralyzes
Not found in the dictionary: retreat
Not found in the dictionary: frankness
Not found in the dictionary: vigor
Not found in the dictionary: understanding
Not found in the dictionary: convinced

Not found in the dictionary: rounded
Not found in the dictionary: registered
Not found in the dictionary: asked
Not found in the dictionary: direction
Not found in the dictionary: humbly
Total mispelled words = ??? <— The number should be between 150 and 200