Arrays and Pointers


How Arrays Are Stored in Memory

The elements of an array are effectively a series of individual variables of the array data type, stored one after the other in the computer's memory. Like all other variables, each element has an address. The address of an array is the address of its first element, which is the address of the first byte occupied by the array.

A picture can help illustrate this. Let's say we declare an array of six integers like this:

   int scores[6] = {85, 79, 100, 95, 68, 89};

Let's assume that the first element of the array scores has the address 1000. This means that &scores[0] would be 1000. Since an int occupies 4 bytes, the address of the second element of the array, &scores[1] would be 1004, the address of the third element of the array, &scores[2] would be 1008, and so on.

Array storage

Array names in C++ have a special property. Much of the time when you use an array name, the compiler converts the name into a pointer to the first element of the array. The official standard terminology is that an array name is converted to a pointer to the first element when used in a value context as opposed to an object context.

A value context is when you're retrieving and subsequently using the value of an object, such as in a cout statement:

   // Prints the address of the first element of the array because scores is being used
   // in a value context and is therefore converted into a pointer to the first array
   // element

   cout << "Address of first element of scores is " << scores << endl;

(Note that the use of word "object" in the paragraph above does not necessarily mean an object of a class; it's simply a generic term in computing for a region of storage that contains a value or group of values.)

Here are some more examples of when an array name is used in a value context:

An object context is when you're modifying or querying the object itself, not the value it contains. Using a variable on the left side of an assignment statement is an example of using it in an object context. However, most of the operators that use variables in an object context are not defined for array types and don't actually work:

There are a few places where an array name can legally be used in an object context:

Passing Arrays to Functions

As mentioned above, passing an unsubscripted array name as an argument to a function or method converts the array name into to a pointer to the first element of the array. That means the array is passed by address, which means the function or method it is passed to can change the values stored in the original array.

For example:

   #include <iostream>

   using std::cout;
   using std::endl;

   //prototype: note the notation for an array of int: int[]
   void incrementArray(int[]);

   void main()
      {
      int i;

      int numbers[4] = {1, 2, 3, 4};

      incrementArray(numbers);

      for (i = 0; i < 4; i++)
         cout << numbers[i] << " ";      // Prints 2, 3, 4, 5
      cout << endl;

      return 0;
      }

   void incrementArray(int a[])
      {
      int i;
  
      for (i = 0; i < 4; i++)
         a[i]++;     // this alters values in numbers in main()
      }

Since the unsubscripted array name numbers is converted into a pointer to the first element of the array when passed as an argument to the incrementArray() function, we can even use the notation for "pointer to an int" as the data type of the argument instead of the notation for "array of int":

   #include <iostream>

   using std::cout;
   using std::endl;

   //prototype: note the notation for a pointer to an int: int*
   void incrementArray(int*);

   void main()
      {
      int i;

      int numbers[4] = {1, 2, 3, 4};

      incrementArray(numbers);

      for (i = 0; i < 4; i++)
         cout << numbers[i] << " ";      // Prints 2, 3, 4, 5
      cout << endl;

      return 0;
      }

   void incrementArray(int* a)
      {
      int i;
  
      for (i = 0; i < 4; i++)
         a[i]++;     // this alters values in numbers in main()
      }

As you can see, we can still use the subscript operator with a, even though it has been defined as a "pointer to an int" instead of as an "array of int". In fact, it is legal syntax in C++ to use the subscript operator with any pointer, regardless of whether or not it actually points to an array element. However, if the pointer doesn't in fact point to an array element, it's usually a bad idea!

Passing Array Elements to Functions

If we pass a single element of an array as an argument to a function, it is typically passed by value (i.e. a copy is passed) by default since an element of an array is normally a simple data item, not an array. Imagine a new function, using numbers as declared above, which has the following prototype

   void fn(int);

and is called like this:

   fn(numbers[i]);

Here, fn() cannot alter numbers[i] no matter what it does, since numbers[i] is a simple integer (or whatever type numbers was declared as).

Of course, if we really needed the function to alter numbers[i], we could change the prototype and function definition so that the argument was passed by reference instead of by value.

Pointer Arithmetic

Incrementing a pointer variable makes it point at the next memory address for its data type. The actual number of bytes added to the address stored in the pointer variable depends on the number of bytes an instance of the particular data type occupies in memory. On our Unix system, for example:

But the important idea is that the pointer now points to a new address, exactly where the next occurrence of that data type would be. This is exactly what you want when you write code that loops through an array - just increment a pointer and you are pointing at the next element. You don't even have to know how big the data type is - C++ knows. In fact you could take code that works for our Unix system (with 4-byte integers) that uses a pointer variable to loop through an array and recompile it on a system that uses 2-byte or 8-byte integers and the code would still work just fine.

So given a pointer to a memory location, you can add or subtract to/from it and make it point to a different place in memory. For example, if we declare the following array

   int scores[6] = {85, 79, 100, 95, 68, 89};

which can be visualized like this

Array storage

Notice that these expressions are always of the form (ptr-to-something + int). The "ptr-to-something" part of the expression is sometimes called the base address and the integer added to it is called the offset.

So:

   *(scores + 2) = 90;

changes the third array element from 100 to 90. It's the exact same effect as the subscript notation:

   scores[2] = 90;

So in general, if ar is the name of an array and i is an int,

   *(ar + i)  and  ar[i]

are alternate ways to reference the same element. The following table shows the values and data types of various expressions using the array name scores:

Expression Value Data Type Expression Value Data Type
scores[0] 85 int *(scores+0) 85 int
scores[1] 79 int *(scores+1) 79 int
scores[2] 100 int *(scores+2) 100 int
scores[3] 95 int *(scores+3) 95 int
scores[4] 68 int *(scores+4) 68 int
scores[5] 89 int *(scores+5) 89 int
&scores[0] 1000 int* (scores+0) 1000 int*
&scores[1] 1004 int* (scores+1) 1004 int*
&scores[2] 1008 int* (scores+2) 1008 int*
&scores[3] 1012 int* (scores+3) 1012 int*
&scores[4] 1016 int* (scores+4) 1016 int*
&scores[5] 1020 int* (scores+5) 1020 int*
*scores 85 int *scores+1 86 int
scores 1000 int* &scores 1000 int (*)[6]

Processing Arrays Using Pointer Arithmetic

Pointer arithmetic is frequently used when looping through arrays, especially C strings. For example, suppose we want to write a function to change a C string to all uppercase.

   char s[80] = "some stuff";
   ...
   str_toupper(s);    //the calling statement; passes address of s[0]

We could easily write this function using the familiar subscript notation for accessing an array:

Version 1: Subscript Notation

   void str_toupper(char str[])
      {
      int i;

      for (i = 0; str[i] != '\0'; i++)
         str[i] = toupper(str[i]);
      }

However, since the unsubscripted name of an array used as a function argument is converted by the compiler into a pointer to the first element of the array, we could just as easily treat the incoming argument as a pointer to a char and write the function using pointer notation:

Version 2: "Base Address Plus Offset" Pointer Notation

   void str_toupper(char* str)
      {
      int i;

      for (i = 0; *(str + i) != '\0'; i++)
         *(str + i) = toupper(*(str + i));
      }

The above code is practically identical to the subscript notation version, which should not be surprising if you keep in mind that str[i] and *(str+i) produce exactly the same effect.

There's another way to write the function using pointer notation - rather than adding a integer offset i to the base address in str, we can copy the address in str into a pointer to a char and then alter that address to move from one element of the array to the next:

Version 3a: Pointer Arithmetic Notation

   void str_toupper(char* str)
      {
      char* ptr;

      for (ptr = str; *ptr != '\0'; ptr++)
         *ptr = toupper(*ptr);
      }

OK, let's break down what's going on in this code:

Of course, since str is itself a pointer, we can avoid using the extra variable ptr by just altering the address stored in str:

Version 3b: Pointer Arithmetic Notation

   void str_toupper(char* str)
      {
      for (; *str != '\0'; str++)
         *str = toupper(*str);
      }

As a C++ programmer, you can choose whichever representation suits you best. Most beginners prefer the subscript notation. However, most of the standard C library string functions use pointer notation, so understanding the pointer notation is helpful. It is best not to mix notations unless you have a specific good reason to do so.