comp.lang.c
[Top] [All Lists]

Re: taking a "word" as input

Subject: Re: taking a "word" as input
From: arnuld
Date: Thu, 01 May 2008 03:02:23 +0500
Newsgroups: comp.lang.c

> On Tue, 29 Apr 2008 20:53:50 +0530, santosh wrote:


> What about words with other characters like hyphen? What about
> constructs like "get_name"? Will you discard them too. What about words
> that end with a ; or ...? What about words that contain symbols like #@
> etc? Or words that end with an exclamation mark? Or words within
> parenthesis or braces?

all of them will be discarded. Only words containing letters like
"santosh" will be considered, nothing else.


 

> That's one way yes, suitable when you don't know the lengths of words in
> advance, or you don't want to possibly waste storage with statically
> allocated arrays.

yes, exactly, input will be at run-time only.

 
> Depends on your requirements really, and the type and frequency of input
> you expect. Will you put an upper limit on the length of words?   It
> hardly makes sense to accept words longer than about 64 characters if
> you are dealing with normal English text. 

ok, make the upper limit to 64 :) , I usually take it 100 as my style.



> Static 2D arrays are
> undoubtedly easier to work with but are less flexible than dynamically
> allocated arrays. Since statically allocated arrays are of fixed size
> it's possible for some elements to remain unused and hence wasted. OTOH
> a large number of small allocations may lead to memory fragmentation and
> also some wastage due to malloc bookeeping and possibly also a slowdown
> in speed if you'll be reading a very large number of words from a file.
> For input from a human it will not matter.


you want to say that there will be 2 types of implementations if
efficiency is my concern:

  1.) input from human
  2.) input from a text-file
 
 ??

couldn't there be a single implementation for both types of inputs ?


> One efficient method is to use a single dynamically allocated array in
> which words are stored sequentially. The length of each word could be
> specified by either one or two bytes prefixing the word itself. This
> results in very efficient storage, but is grossly inefficient if you
> want to insert and delete words at random. For this a hash table based
> approach is probably the best. OTOH a tree is very convenient for quick
> searching and sorting.
 
> If you tell us more details about the type and volume of input you
> expect and the facilities (like searching, insertion, etc.) you plan to
> implement, perhaps a tailored approach can be suggested.


The basic problem is to sort, count and print the sorted words.  We are
not going to save a word in an array if it has already appeared, we will
just increase the count for that word.  

K&R2 seems to suggest that a  doubly-linked list using binary search is 
the most efficient method to use, described in section 6.5 and is already
solved. Actually I am not able to understand the <getword> function of the
authors which actually is different from what I want, hence I need to
create one of my own.





-- 
http://lispmachine.wordpress.com/
my email ID is at the above address


<Prev in Thread] Current Thread [Next in Thread>
Privacy Policy