is C necessarily faster than C++

ka@socrates.hr.att.com (Kenneth Almquist)
Fri, 28 Apr 1995 19:32:54 GMT

          From comp.compilers

Related articles
[10 earlier articles]
Re: is C necessarily faster than C++ ruiter@ruls41.fsw.leidenuniv.nl (1995-04-20)
Re: is C necessarily faster than C++ cliffc@crocus.hpl.hp.com (1995-04-17)
Re: is C necessarily faster than C++ gardner@pink-panther.cs.uiuc.edu (1995-04-28)
Re: is C necessarily faster than C++ urs@engineering.ucsb.edu (1995-04-28)
Re: is C necessarily faster than C++ quanstro@hp-demo1.minerva.bah.com (1995-04-28)
Re: is C necessarily faster than C++ beard@cs.ucdavis.edu (Patrick C. Beard) (1995-04-28)
is C necessarily faster than C++ ka@socrates.hr.att.com (1995-04-28)
Re: is C necessarily faster than C++ jplevyak@pink-panther.cs.uiuc.edu (1995-04-29)
Re: is C necessarily faster than C++ tmb@netcom.com (1995-04-29)
Re: is C necessarily faster than C++ jdean@pysht.cs.washington.edu (1995-05-09)
Re: is C necessarily faster than C++ calder@mumble.cs.Colorado.EDU (1995-05-09)
Re: is C necessarily faster than C++ schow@bnr.ca (stanley (s.t.h.) chow) (1995-05-09)
Re: is C necessarily faster than C++ mike@vlsivie.tuwien.ac.at (1995-05-04)
[5 later articles]
| List of all articles for this month |
Newsgroups: comp.compilers
From: ka@socrates.hr.att.com (Kenneth Almquist)
Keywords: C, C++, performance
Organization: AT&T
References: 95-04-044
Date: Fri, 28 Apr 1995 19:32:54 GMT

Here are some timings for matrix addition. C doesn't support multi-
dimensional arrays unless the array bounds are known at compile time.
The simplest solution way to code the matrix addition routine is to
simply treat the matrices as one dimensional arrays:


        add(double *a, double *b, int dim1, int dim2) {
int i;


for (i = dim1 * dim2 ; --i >= 0 ; )
*b++ += *a++;
        }


When this is compiled using GCC on a Pentium, it takes 2.90ms to add
two 100x100 matrices. On an HP PA-RISC machine using the native
compiler, it takes 0.638ms.


In C++ it's good style to use a matrix class. Rather than present C++
code, I'll stick with C in this article. The reason is that the C++
compiler on the HP is cfront, which actually generates C code, and the
C++ compiler I have on the Pentium is g++, which shares a common back
end with gcc. So we can estimate the performance of C++ just by looking
at how the C compilers handle various pieces of C code.


With a matrix class, all the matrix references are done by subroutine
calls. Inlining the subroutine calls by hand gives us the following
code:


        add(double *a, double *b, int dim1, int dim2) {
int i, j;


for (i = 1 ; i <= dim1 ; i++) {
for (j = 1 ; j <= dim2 ; j++) {
b[(i - 1) * dim2 + (j - 1)] += a[(i - 1) * dim2 + (j - 1)];
}
}
        }


This subroutine has a lot more operations than the previous version,
but the compilers improve things a lot. GCC applies common subexpression
elimination to the subscripts, and moves the multiplication out of the
inner loop. However, it uses a single subscript variable rather than
having one pointer for each array (as the orignal code does). It takes
3.20ms, which is about 10% slower than the previous version. The HP C
compiler does better, taking 0.653ms (2.4% slower).


Using a matrix class means that each matrix has it's own dimension values,
even though both matrices happen to be the same size in this instance.


        add(double *a, int a_dim1, int a_dim2, double *b, int b_dim1, int b_dim2) {
int i, j;


for (i = 1 ; i <= a_dim1 ; i++) {
for (j = 1 ; j <= a_dim2 ; j++) {
b[(i - 1) * b_dim2 + (j - 1)] += a[(i - 1) * a_dim2 + (j - 1)];
}
}
        }


This runs at exactly the same speed as the preceding version on the HP,
but the Pentium doesn't have enough registers and the execution time slows
to 3.50ms.


The other thing that happens with a C++ matrix class is that the dimensions
are stored in the class, which in C we represent as a structure.


        struct matinfo {
double *addr;
int dim1;
int dim2;
        };


This structure is passed to the addition routine.


        add(const struct matinfo *a, const struct matinfo *b) {
int i, j;


for (i = 1 ; i <= a->dim1 ; i++) {
for (j = 1 ; j <= a->dim2 ; j++) {
b->addr[(i - 1) * b->dim2 + (j - 1)] += a->addr[(i - 1) * a->dim2 + (j - 1)];
}
}
        }


Now that the dim2 values are stored in memory, it is difficult to determine
when they change due to the possibility of aliasing. Like most compilers,
GCC simply assumes that the dim2 values are aliased. As a result, the
subscript calculations are done entirely in the inner loop, and the execution
time increases to 4.89ms. On the HP, the results are more dramatic. The
execution time increase to 6.96ms, which is slower than the other times by
more than a factor of 10. With RISC processors (like the HP), there is
typically a large penalty for not using registers effectively. The Pentium
has only a few on-chip registers, relying instead on a fast on-chip cache.


In short, optimizing back ends designed for C are not really adequate for
C++.
Kenneth Almquist
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.