Sunday 17 February 2013

Simple and efficient word count

Question by Cheliyan Natarajan

I need a technical help. How to find the number of occurrence of a all words in a string. the criteria is that it should be simple and with less complexity. I used the split function to put the words in an array and used a for loop to count. But this increases complexity as split uses one for loop internally also. Can you give me some idea


Answer:

I presume that you need the solution in C#.NET and that your need is not to count the occurrence of a specific word in the string. Split function may not be a best fit to count the words as you may still have to deal with the split strings for punctuations or other white spaces like tab, line feed, etc. The other options are to use the Regular Expression or to use a character array and looping through to find out the words.

The following code will give you the count of words using regular expression:

MatchCollection collection = Regex.Matches(textBox1.Text, @"[\S]+");
int x = collection.Count;

Depending on the definition of a word, the regular expression string may have to be redefined.

Alternatively the word count can be performed by taking the string into a character array and then looping through it to determine the start of a word or end of the word. Here again, appropriately modify the condition to evaluate the word. Here is the code sample.


            int c = 0;
            string s = textBox1.Text;
            for (int i = 1; i < s.Length; i++)
            {
                if (char.IsWhiteSpace(s[i - 1]) == true)
                {
                    if (char.IsLetterOrDigit(s[i]) == true ||
                        char.IsPunctuation(s[i]))
                    {
                        c++;
                    }
                }
            }


Regular expression is found to be marginally slower than the char array alternative. Hope this helps.