读书人

求改善:计算一个词在文件中出现的次数

发布时间: 2013-10-19 20:58:22 作者: rapoo

求改进:计算一个词在文件中出现的次数的代码
写了一个用以计算一个词在文件中出现的次数的代码,但不知是否能够达到要求。无论这个词是全部小写还是全部大写,或者是首字母大写,都算在内。验证过了,但应该还有改进之处,请多多指教!

/* counting the presence of a word in a text file */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <stdbool.h>

#define LEN 41
#define MAX 256

void to_Upper(char *); //将string中的字母全部变成大写

int main(void)
{
char wordtofind[LEN];
char wordinfile[LEN];
char tempcap[LEN]; //用作复制一份wordtofind串,然后将其中的首字母变大写
char tempupper[LEN]; //同上,然后用作将其中的所有字母变大写
char filename[MAX];
char *wif = wordinfile;
int ch;
long count;
bool state;
FILE * fp;

puts("Input a word to count (empty line to quit):");
while(gets(wordtofind) != NULL && wordtofind[0] != '\0')
{
count = 0;
state = false;
puts("Input the name of the file to search the word in:");
gets(filename);
if((fp = fopen(filename, "r")) == NULL)
{
fprintf(stderr, "Error opening %s", filename);
exit(EXIT_FAILURE);
}
while((ch = getc(fp)) != EOF)
{
if(isalpha(ch))
{
state = true;
*wif++ = ch; //装载wordinfile串
}
if((isspace(ch) || ispunct(ch)) && state == true)
{
*wif = '\0'; //完成wordtofile串
strcpy(tempcap, wordtofind);
strcpy(tempupper, wordtofind);
to_Upper(tempupper);
tempcap[0] = toupper(tempcap[0]);
if(strcmp(wordtofind, wordinfile) == 0 ||
strcmp(tempcap, wordinfile) == 0 ||
strcmp(tempupper, wordinfile) == 0)
count++; //在file中发现了相同的串即计数
state = false;
wif = wordinfile; //再次初始化wif
}
}
fclose(fp);
printf("%s found %ld %s.\n",
wordtofind, count, (count > 1)? "times" : "time");
puts("Input a word to count (empty line to quit):");
}
puts("Done!");
system("PAUSE");

return 0;
}

void to_Upper(char *str)
{
while(*str)
*str++ = toupper(*str);


}


这个还真没试过......那个isalpha()函数只能识别英语字母的吗?
Locale
Use the setlocale function to change or query some or all of the current program locale information. “Locale” refers to the locality (the country and language) for which you can customize certain aspects of your program. Some locale-dependent categories include the formatting of dates and the display format for monetary values. For more information, see Locale Categories.

Locale-Dependent Routines


Routine
Use setlocale Category
Setting Dependence
atof, atoi, atol Convert character to floating-point, integer, or long integer value, respectively LC_NUMERIC
is Routines Test given integer for particular condition. LC_CTYPE
isleadbyte Test for lead byte () LC_CTYPE
localeconv Read appropriate values for formatting numeric quantities LC_MONETARY, LC_NUMERIC
MB_CUR_MAX Maximum length in bytes of any multibyte character in current locale (macro defined in STDLIB.H) LC_CTYPE
_mbccpy Copy one multibyte character LC_CTYPE
_mbclen Return length, in bytes, of given multibyte character LC_CTYPE
mblen Validate and return number of bytes in multibyte character LC_CTYPE
_mbstrlen For multibyte-character strings: validate each character in string; return string length LC_CTYPE
mbstowcs Convert sequence of multibyte characters to corresponding sequence of wide characters LC_CTYPE
mbtowc Convert multibyte character to corresponding wide character LC_CTYPE
printf functions Write formatted output LC_NUMERIC (determines radix character output)
scanf functions Read formatted input LC_NUMERIC (determines radix character recognition)
setlocale, _wsetlocale Select locale for program Not applicable
strcoll, wcscoll Compare characters of two strings LC_COLLATE
_stricoll, _wcsicoll Compare characters of two strings (case insensitive) LC_COLLATE
_strncoll, _wcsncoll Compare first n characters of two strings LC_COLLATE
_strnicoll, _wcsnicoll Compare first n characters of two strings (case insensitive) LC_COLLATE
strftime, wcsftime Format date and time value according to supplied format argument LC_TIME
_strlwr Convert, in place, each uppercase letter in given string to lowercase LC_CTYPE
strtod, wcstod, strtol, wcstol, strtoul, wcstoul Convert character string to double, long, or unsigned long value LC_NUMERIC (determines radix character recognition)


_strupr Convert, in place, each lowercase letter in string to uppercase LC_CTYPE
strxfrm, wcsxfrm Transform string into collated form according to locale LC_COLLATE
tolower, towlower Convert given character to corresponding lowercase character LC_CTYPE
toupper, towupper Convert given character to corresponding uppercase letter LC_CTYPE
wcstombs Convert sequence of wide characters to corresponding sequence of multibyte characters LC_CTYPE
wctomb Convert wide character to corresponding multibyte character LC_CTYPE
_wtoi, _wtol Convert wide-character string to int or long LC_NUMERIC


[解决办法]

引用:
Quote: 引用:


Quote: 引用:

Quote: 引用:

建议每次查找时,可以把文件打开使用上次使用的文件描述符。一次打开多次使用,只是每次都要把文件设置成开始位置。可以使用fseek的哦。


引用:
1 用strcasecmp函数比较会好!对大小写不敏感(无视大小写)
2 不建议频繁的对文件打开关闭操作,可以移动文件描述符操作!


频繁打开关闭文件操作的问题我想过,我这里这样做的目的是想给程序使用者一个机会每次都输入不同词在不同文件中查找。另外,我想请教一下,频繁打开关闭文件有何弊端。


浪费资源

在一个循环周期之内打开和关闭,重复这个过程,应该也不会怎么浪费资源吧?


当你close 文件,memory 中的内容写回disk, 再open 要从disk 中读取,disk 的读取速度与memory相差多少lz应该清楚吧

读书人网 >C语言

热点推荐