Binary Search

 
The binary search is an algorithm of searching, used with the sorted data. As we have sorted elements in the array, binary search method can be employed to find data in the array. The binary search finds an element in the sorted array in log n time. If we have 100000 elements in the array, the log 1000000 will be 20 i.e. very small as compared to 100000. Thus binary search is very fast.
The binary search is like looking up a phone number in the directory or looking up a word in the dictionary. For looking a word in the dictionary, we start from the middle in the dictionary. If the word that we are looking for comes before words on the page, it shows that the word should be before this page. So we look in the first half. Otherwise, we search for the word in the second half of the dictionary. Suppose the word is in the first half of the dictionary, we consider first half for looking the word. We have no need to look into the second half of the dictionary. Thus the data to be searched becomes half in a step. Now we divide this portion into two halves and look for the word. Here we again come to know that the word is in the first half or in the second half of this portion. The same step is repeated with the part that contains the required word. Finally, we come to the page where the required word exists. We see that in the binary search, the data to be searched becomes half in each step. And we find the entry very fast. The number of maximum steps needed to find an entry is log n, where n is the total number of entries. Now if we have 100000 entries, the maximum number of attempts (steps) required to find out the entry will be 20 (i.e. log 1000000).
If already sorted data is available, then it is better to apply an algorithm of binary search for finding some item inside instead of searching from start to the end in sequence. The application of this algorithm will help get the results very quickly.

Binary search is like looking up a phone number or a word in the dictionary
- Start in middle of book
- If the name you're looking for, comes before names on the page, search in the first half
- Otherwise, look into the second half
clip_image003
The telephone directory is the quotable example to understand the way the binary search method works.
Consider the data is present in an array as we discussed in the previous lecture. For the first implementation, we supposed that the data is not sorted in the array. For second implementation, we considered that the data inside the array is put in sorted array. The advantage of the effort of putting the data in the array in sorted order pays off when the searches on data items are performed.
Now, let’s first see the algorithm (in pseudo code) for this operation below. It is important to mention that this algorithm is independent of data type i.e. the data can be of any type numeric or string.
if ( value == middle element )
value is found
else if ( value < middle element )
search left half of list with the same method
else
search right half of list with the same method
The item we are searching for in this algorithm is called value. The first comparison of this value is made with the middle element of the array. If both are equal, it means that we have found our desired search item, which is present in the middle of the array. If this is not the case, then the value and the middle element are not the same. The else-if part of the algorithm is computed, which checks if the value is less than the middle element. If so, the left half part of the array is searched further in the same fashion (of logically splitting that half part of array into two further halves and applying this algorithm again). This search operation can also be implemented using the recursive algorithm but that will be discussed later. At the moment, we are discussing only the non-recursive algorithm. The last part of the algorithm deals with the case when the value is greater than the middle element. This processes the right half of the array with the same method.
Let’s see this algorithm in action by taking an example of an array of elements:
Binary Search – Example 1
Case 1: val == a[mid]
val = 10
low = 0, high = 8
mid = (0 + 8) / 2 = 4
clip_image005
You can see an array a in the Fig 39.2 with indexes 0 to 8 and values 1, 5, 7, 9, 10, 13, 17, 19 and 27. These values are our data items. Notice that these are sorted in ascending (increasing) order.
You can see in the first line of case 1 that val = 10, which indicates that we are searching for value 10. From second line, the range of data to search is from 0 to 8. In this data range, the middle position is calculated by using a simple formula (low + high)/2. In this case, it is mid =(0+8)/2=4. This is the middle position of the data array. See the array in the above figure Fig 39.2, which shows that the item at array position 4 is 10, exactly the value we are searching for. So, in this case, we have found the value right away in the middle position of the array. The search operation can stop here and an appropriate value can be returned back.
Let’s see the case 2 now:
Binary Search – Example 2
Case 2: val > a[mid]
val = 19
low = 0, high = 8
mid = (0 + 8) / 2 = 4
new low = mid + 1 = 5
clip_image007
The second case is about the scenario when value (val) is greater than the middle value (a[mid]). The range of data items (low and high) is the same as that in case 1. Therefore, the middle position (mid) is also the same. But the value (val) 19 is greater than the value at the middle (mid) 10 of the array. As this array is sorted, therefore, the left half of the array must not contain value 19. At this point of time, our information about val 19 is that it is greater than the middle. So it might be present in the right half of the array. The right half part starts from position 5 to position 8. It is shown in Fig 39.3 that the new low is at position 5. With these new low and high positions, the algorithm is applied to this right half again.
Now, we are left with one more case.
Binary Search – Example 3
Case 3: val < a[mid]
val = 7
low = 0, high = 8
mid = (0 + 8) / 2 = 4
new high = mid - 1 =5
clip_image009
The value to be searched (val) in this case is 7. The data range is the same starting from low=0 to high=8. Middle is computed in the same manner and the value at the middle position (mid) is compared with the val. The val is less than the value at mid position. As the data is sorted, therefore, a value lesser than the one at mid position should be present in the lower half (left half) of the array (if it is there). The left half of the array will start from the same starting position low=0 but the high position is going to be changed to mid-1 i.e. 3. Now, let’s execute this algorithm again on this left half.
clip_image011
Firstly, we will compute the middle of 0 and 3 that results in 1 in this integer division. This is shown in the top array in Fig 39.5. Now, the val 7 is compared with the value at the middle position (mid) i.e.5. As 7 is greater than 5, we will process the right half of this data range (which is positioned from low=0 to high=3). The right half of this data range starts from position 2 and ends at position 3. The new data range is low=2 and high=3. The middle position (mid) is computed as (2+3)/2=2. The value at the mid position is 7. We compare the value at mid position (7) to the val we are looking for. These are found to be equal and finally we have the desired value.
Our desired number is found within positions- 0 to 8 at position 2. Without applying this binary search algorithm, we might have performed lot more comparisons. You might feel that finding this number 7 sequentially is easier as it is found at position 2 only. But what will happen in case we are searching for number 27. In that case, we have to compare with each element present in the array to find out the desired number. On the other hand, if this number 27 is searched with the help of the binary search algorithm, it is found in third comparison.
Actually, we have already seen binary search while studying the binary search tree. While comparing the number with the root element of the tree, we had come to know that if the number was found smaller than the number in the root, we had to switch to left-subtree of the root (ensuring that it cannot be found in the right subtree).
Now, let’s see the C++ code for this algorithm:

Binary Search – C++ Code
int isPresent(int *arr, int val, int N)
{
int low = 0;
int high = N - 1;
int mid;
while ( low <= high )
{
mid = ( low + high )/2;
if (arr[mid] == val)
return 1; // found!
else if (arr[mid] < val)
low = mid + 1;
else
high = mid - 1;
}
return 0; // not found
The name of the routine is isPresent, which expects an int pointer (an array in actual); an int value val is required to be searched. Another int value is N that indicates the maximum index value of the array. Inside the body of the function, int variables low, high and mid are declared. low is initialized to 0 and high is initialized to N-1. while loop is based on the condition that executes the loop until low <= high. Inside the loop, very first thing is the calculation of the middle position (mid). Then comes the first check inside the loop, it compares the val (the value being searched) with the number at middle position (arr[mid]). If they are equal, the function returns 1. If this condition returns false, it means that the numbers are unequal. Then comes the turn of another condition. This condition (arr[mid] < val) is checking if the value at the middle is less than the value being searched. If this is so, the right half of the tree is selected by changing the position of the variable low to mid+1 and processed through the loop again. If both of these conditions return false, the left half of the array is selected by changing the variable high to mid-1. This left half is processed through the loop again. If the loop terminates and the required value is not found, then 0 is returned as shown in the last statement of this function.
You change this function to return the position of the value if it is found in the array otherwise return –1 to inform about the failure. It is important to note that this function requires the data to be sorted to work properly. Otherwise, it will fail.
This algorithm is depicted figurative in Fig 39.6.
Binary Search – Binary Tree
clip_image013
- The search divides a list into two small sub-lists till a sub-list is no more divisible.
You might have realized about the good performance of binary trees by just looking at these if you remember the fully balanced trees of N items discussed earlier.
Binary Search - Efficiency
To see the efficiency of this binary search algorithm, consider when we divide the array of N items into two halves first time.
After 1 bisection N/2 items
After 2 bisections N/4 = N/22 items
. . .
After i bisections N/2i =1 item
____________________________________
i = log2 N
First half contains N/2 items while the second half also contains around N/2 items. After one of the halves is divided further, each half contains around N/4 elements. At this point, only one of N/4 items half is processed to search further. If we carry out three bisections, each half will contain N/8 items. Similarly for i bisections, we are left with N/2i, which is at one point of time is only one element of the array. So we have the equation here:
N/2i =1
Computing the value of i from this gives us:
i = log2 N
Which shows that after maximum log2 N bisections, either you will be successful in finding your item or fail to do so.
This was our second implementation of table or dictionary abstract data type using sorted sequential array. As discussed at start of it that if we implement table or dictionary abstract data type using an array, we have to keep the array elements in sorted order. An easier way to sort them can be that whenever we want to insert an element in the array, firstly we find its position (in sorted order) in the array and then shift the right side (of that position) elements one position towards right to insert it. In worst case, we might have to shift all the elements one position towards right just to keep the data sorted so this will be proportional to N. Similarly the remove operation is also proportional to N. But after keeping the data sorted, the search operation is returned within maximum log2 N bisections.






















































































Reactions:

0 comments:

Post a Comment