读书人

【只是个小程序小函数】想问一上一般C

发布时间: 2012-09-14 11:53:44 作者: rapoo

【只是个小程序小函数】想问一下一般C中对文件名判断的方法有哪些比较简单的方法或函数

C/C++ code
#include<stdio.h>//#include<stdlib.h>#include<string.h>#define ERR_CHAR 47,92,58,42,63,34,60,62,124int fcmp(char * arg1);int main(int argc,char *argv[]){    bool qt=1;    while(qt)    {        if(argc!=2)        {            printf("参数不正确!\n");            qt=0;        }        else        {            if(!fcmp(argv[1]))            {                printf("文件名正确!\n");//文件名正确继续操作的代码添加处                printf("文件名:%s",argv[1]);                qt=0;            }            else            {                printf("文件名不正确!");//文件名不正确的处理代码添加处                qt=0;            }        }    }    return 0;}//验证文件名中是否包含特字符如:/\:*?"<>|int fcmp(char * arg1){    int arg1len;    char arg2[9]={ERR_CHAR};    arg1len=strlen(arg1);    if(arg1len>100)//限制文件名长度,虽然路径加文件名最长达两百多以上        return 2;    for (int i=0;i<arg1len;i++)    {        for(int j=0;j<9;j++)        {            if(arg1[i]==arg2[j])            {                return 1;            };        }    }    return 0;}


我最近在看C Primer Plus这本书,才看到第12章,是个小菜,总想边看的时候边自己弄点什么来玩玩,想做的是一个读取转存文件的小程序,但想到文件名中有些字符不能有,所以就有了以上的这段代码,在写那个函数的时候碰到了几个问题,所以想来看看大伙的意见!当然实际在cmd命令提示符里输的时候会受cmd的限制,比如CMD里的“<>"”这几个字符是会优先处理的,所在这里的程序肯定也是有些问题的(因为实际程序接收到的字符是处理过了的字符,不过可以通过从文件中来读取字符做为文件名来验证)。

第一个就是“char arg2[9]={47,92,58,42,63,34,60,62,124};”这个地方,我原先是这样定义的:“char arg2[9]={'/','\',':','*','?','"','<','>','|'};”但程序会报定义不匹配的错,所有就改成了对就的ANSII码,我知道那个ANSII码里的00对应的是'\0',但这里的这些字符却不知道怎么表示,这也是我发这个贴的最主要的一个原因,主要就是想问一下在C语文中一些特殊字符的比较、定义和使用的方法!

第二个就是不知道在C中有没有什么比较好判断文件名合法性的函数,我觉得我这个判断函数对unicode字符的文件名估计会有些问题,可能会出现误判,虽然暂时随机测的几个汉字没有出错,但也确实不知道unicode的编码到底是怎么回事,只知道有这种编码方式且大至在65535个左右吧,不知道有没有记错,所以希望大伙推荐一下有没有什么讲解unicode字符的文档,只想对unicode字符与其对应的十六进制码的关系大至有些了解,不用详细的对应关系,那样的话就玩的太大了,呵。。。

[解决办法]
特殊字符适合用二进制字节流unsigned char[]来表达。
[解决办法]
char arg2[9]={'/','\\',':','*','?','\"','<','>','|'};

【\】【'】【"】这些特殊符号如果直接写在代码里,必须前面加\转义,变成【\\】【\'】【\"】

[解决办法]
C++ Character Constants
Character constants are one or more members of the “source character set,” the character set in which a program is written, surrounded by single quotation marks ('). They are used to represent characters in the “execution character set,” the character set on the machine where the program executes.

Microsoft Specific

For Microsoft C++, the source and execution character sets are both ASCII.

END Microsoft Specific

There are three kinds of character constants:

Normal character constants


Multicharacter constants


Wide-character constants
Note Use wide-character constants in place of multicharacter constants to ensure portability.

Character constants are specified as one or more characters enclosed in single quotation marks. For example:

char ch = 'x'; // Specify normal character constant.
int mbch = 'ab'; // Specify system-dependent
// multicharacter constant.
wchar_t wcch = L'ab'; // Specify wide-character constant.

Note that mbch is of type int. If it were declared as type char, the second byte would not be retained. A multicharacter constant has four meaningful characters; specifying more than four generates an error message.

Syntax

character-constant :

'c-char-sequence'
L'c-char-sequence'

c-char-sequence :

c-char


c-char-sequence c-char

c-char :

any member of the source character set except the single quotation mark ('), backslash (\), or newline character
escape-sequence

escape-sequence :

simple-escape-sequence
octal-escape-sequence
hexadecimal-escape-sequence

simple-escape-sequence : one of

\' \" \? \\
\a \b \f \n \r \t \v

octal-escape-sequence :

\octal-digit
\octal-digit octal-digit
\octal-digit octal-digit octal-digit

hexadecimal-escape-sequence :

\xhexadecimal-digit
hexadecimal-escape-sequence hexadecimal-digit

Microsoft C++ supports normal, multicharacter, and wide-character constants. Use wide-character constants to specify members of the extended execution character set (for example, to support an international application). Normal character constants have type char, multicharacter constants have type int, and wide-character constants have type wchar_t. (The type wchar_t is defined in the standard include files STDDEF.H, STDLIB.H, and STRING.H. The wide-character functions, however, are prototyped only in STDLIB.H.)

The only difference in specification between normal and wide-character constants is that wide-character constants are preceded by the letter L. For example:

char schar = 'x'; // Normal character constant
wchar_t wchar = L'\x81\x19'; // Wide-character constant

Table 1.2 shows reserved or nongraphic characters that are system dependent or not allowed within character constants. These characters should be represented with escape sequences.

Table 1.2 C++ Reserved or Nongraphic Characters

Character ASCII
Representation ASCII
Value Escape Sequence
Newline NL (LF) 10 or 0x0a \n
Horizontal tab HT 9 \t
Vertical tab VT 11 or 0x0b \v
Backspace BS 8 \b
Carriage return CR 13 or 0x0d \r
Formfeed FF 12 or 0x0c \f
Alert BEL 7 \a
Backslash \ 92 or 0x5c \\
Question mark ? 63 or 0x3f \?
Single quotation mark ' 39 or 0x27 \'
Double quotation mark " 34 or 0x22 \"
Octal number ooo — \ooo
Hexadecimal number hhh — \xhhh
Null character NUL 0 \0


If the character following the backslash does not specify a legal escape sequence, the result is implementation defined. In Microsoft C++, the character following the backslash is taken literally, as though the escape were not present, and a level 1 warning (“unrecognized character escape sequence”) is issued.

Octal escape sequences, specified in the form \ooo, consist of a backslash and one, two, or three octal characters. Hexadecimal escape sequences, specified in the form \xhhh, consist of the characters \x followed by a sequence of hexadecimal digits. Unlike octal escape constants, there is no limit on the number of hexadecimal digits in an escape sequence.

Octal escape sequences are terminated by the first character that is not an octal digit, or when three characters are seen. For example:

wchar_t och = L'\076a'; // Sequence terminates at a
char ch = '\233'; // Sequence terminates after 3 characters

Similarly, hexadecimal escape sequences terminate at the first character that is not a hexadecimal digit. Because hexadecimal digits include the letters a through f (and A through F), make sure the escape sequence terminates at the intended digit.

Because the single quotation mark (') encloses character constants, use the escape sequence \' to represent enclosed single quotation marks. The double quotation mark (") can be represented without an escape sequence. The backslash character (\) is a line-continuation character when placed at the end of a line. If you want a backslash character to appear within a character constant, you must type two backslashes in a row (\\). (SeePhases of Translation in the Preprocessor Reference for more information about line continuation.)



[解决办法]
文件名中照样允许<>,之所以你觉得不允许,那是shell的行为,它把<>用作重定向了。
在bash下:
touch '<'
rm '<'

刚入门,打好基础很重要,自己不清楚的多搜索下,不要先入为主,否则到后面再来纠正错误的观点很麻烦的。

读书人网 >C语言

热点推荐