comp.lang.c
[Top] [All Lists]

Re: taking a "word" as input

Subject: Re: taking a "word" as input
From: Nick Keighley
Date: Wed, 30 Apr 2008 07:20:40 -0700 PDT
Newsgroups: comp.lang.c

On 30 Apr, 23:02, arnuld <NoS...@xxxxxxxxxx> wrote:
> > On Tue, 29 Apr 2008 20:53:50 +0530, santosh wrote:

> > What about words with other characters like hyphen? What about
> > constructs like "get_name"? Will you discard them too. What about words
> > that end with a ; or ...? What about words that contain symbols like #@
> > etc? Or words that end with an exclamation mark? Or words within
> > parenthesis or braces?
>
> all of them will be discarded. Only words containing letters like
> "santosh" will be considered, nothing else.
>
> > That's one way yes, suitable when you don't know the lengths of words in
> > advance, or you don't want to possibly waste storage with statically
> > allocated arrays.
>
> yes, exactly, input will be at run-time only.

I don't understand what you mean here


> > Depends on your requirements really, and the type and frequency of input
> > you expect. Will you put an upper limit on the length of words?   It
> > hardly makes sense to accept words longer than about 64 characters if
> > you are dealing with normal English text.
>
> ok, make the upper limit to 64 :) , I usually take it 100 as my style.
>
> > Static 2D arrays are
> > undoubtedly easier to work with but are less flexible than dynamically
> > allocated arrays. Since statically allocated arrays are of fixed size
> > it's possible for some elements to remain unused and hence wasted. OTOH
> > a large number of small allocations may lead to memory fragmentation and
> > also some wastage due to malloc bookeeping and possibly also a slowdown
> > in speed if you'll be reading a very large number of words from a file.
> > For input from a human it will not matter.
>
> you want to say that there will be 2 types of implementations if
> efficiency is my concern:
>
>   1.) input from human
>   2.) input from a text-file
>
>  ??
>
> couldn't there be a single implementation for both types of inputs ?

yes. But file or human might influence your design. People type
v e r y  s l o w l y  so a human input only program doesn't need to
be fast (for this problem). The file input one should work just fine
with people.


> > One efficient method is to use a single dynamically allocated array in
> > which words are stored sequentially. The length of each word could be
> > specified by either one or two bytes prefixing the word itself. This
> > results in very efficient storage, but is grossly inefficient if you
> > want to insert and delete words at random. For this a hash table based
> > approach is probably the best. OTOH a tree is very convenient for quick
> > searching and sorting.
> > If you tell us more details about the type and volume of input you
> > expect and the facilities (like searching, insertion, etc.) you plan to
> > implement, perhaps a tailored approach can be suggested.
>
> The basic problem is to sort, count and print the sorted words.  We are
> not going to save a word in an array if it has already appeared, we will
> just increase the count for that word.  

that didn't really answer the question...

> K&R2 seems to suggest that a  doubly-linked list using binary search is
> the most efficient method to use, described in section 6.5 and is already
> solved. Actually I am not able to understand the <getword> function of the
> authors which actually is different from what I want, hence I need to
> create one of my own.


--
Nick Keighley

<Prev in Thread] Current Thread [Next in Thread>
Privacy Policy