阶乘函数的优化问题
int fact(int n)性能优化 C 阶乘
{
int i;
int result = 1;
for(i = n; i > 1; i--)
result = result * i;
return result;
}
int fact_u2(int n)
{
int i;
int result = 1;
for(i = n; i > 1; i -= 2)
result = (result * i) * (i - 1);
return result;
}
int fact_u3(int n)
{
int i;
int result = 1;
for(i = n; i > 1; i -= 2)
result = result * (i * (i - 1));
return result;
}
为什么fact_u2相对fact性能没有改进,而使用fact_u3性能改进?谢谢
[解决办法]
手工优化有没有效果,要在编译器优化编译之后去判断。在VS2008中以release编译,结果fact_u2与fact_u3被合并成了一个函数,而且优化效果并不明显:
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
int fact(int n)
{
int i;
int result = 1;
for(i = n; i > 1; i--)
result = result * i;
return result;
}
int fact_u2(int n)
{
int i;
int result = 1;
for(i = n; i > 1; i -= 2)
result = (result * i) * (i - 1);
return result;
}
int fact_u3(int n)
{
int i;
int result = 1;
for(i = n; i > 1; i -= 2)
result = result * (i * (i - 1));
return result;
}
#define CALLTIMES 1000000000
int main()
{
clock_t start;
int i, value;
value = 12345678;
start = clock();
for( i = 0; i < CALLTIMES; ++ i)
{
value += value * fact(12);
}
start = clock()- start;
printf("value:%d\tfact:%f\n", value, (double)start / CLOCKS_PER_SEC);
value = 12345678;
start = clock();
for( i = 0; i < CALLTIMES; ++ i)
{
value += value * fact(12);
}
start = clock()- start;
printf("value:%d\tfact_u2:%f\n", value, (double)start / CLOCKS_PER_SEC);
value = 12345678;
start = clock();
for( i = 0; i < CALLTIMES; ++ i)
{
value += value * fact(12);
}
start = clock()- start;
printf("value:%d\tfact_u3:%f\n", value, (double)start / CLOCKS_PER_SEC);
printf("fact_u2:%x\tfact_u3:%x\n", &fact_u2,&fact_u3);
}
输出结果:
value:-292789938 fact:1.515000
value:-292789938 fact_u2:1.344000
value:-292789938 fact_u3:1.406000
fact_u2:401030 fact_u3:401030
把返回值都改为double之后,fact_u2与fact_u3没有被合并.下面是返回double并用改求60的阶乘的运行结果:
value:-2147483648 fact:14.813000
value:-2147483648 fact_u2:15.234000
value:-2147483648 fact_u3:15.063000
fact_u2:4010a0 fact_u3:4010e0
所以说,在C代码中去考虑指令级优化是没有意义的,因为你不知道编译器会如何生成代码。要么交给编译器去优化,要么直接写汇编代码。
[解决办法]
问了一下, fact2和fact3相比优化少一些, 一处无法并发需要等待:
这是2的:
x1 = result*i
y1 = x1*(i-1);
x2 = y1*(i-2);
y2 = x2*(i-2-1);
这是3的:
x1=i*(i-1)
y1=result*x1
x2=(i-2)*(i-1-2)
y2= y1*x2
你看2中x2的计算是依赖y1的, 而3中y1和x2可以同时计算.