How to Remove Duplicate Elements from an Array

Posts

Arrays are foundational structures in programming that allow the storage and management of multiple values of the same data type under a single variable name. This simplifies data handling and improves efficiency when dealing with large volumes of structured information. Arrays offer a straightforward way to group data such as integers, characters, booleans, or even objects in higher-level languages. Each element in an array can be accessed using its index, which starts from zero. This direct access capability makes arrays a go-to data structure in numerous algorithmic solutions.

Despite their simplicity and versatility, arrays often encounter a common problem—duplicate elements. Duplicate elements in an array can result from repeated user input, sensor readings, or data aggregation processes. While duplicates may not always be harmful, they can compromise data accuracy in certain applications such as search algorithms, statistical analysis, and data visualization. Therefore, removing duplicates becomes essential to ensure data integrity and improve performance.

This article explores multiple techniques to remove duplicate elements from arrays across different programming languages. Each method highlights its implementation logic and benefits. In this first part, we focus on understanding arrays in depth and dive into the first method—using a temporary array to eliminate duplicates.

Understanding Arrays in Programming

Arrays are linear data structures that store a fixed-size collection of elements of the same type. For instance, an array can hold five integers, and all values will reside in contiguous memory locations. This predictable layout enhances memory access speed and simplifies data processing. Arrays can be one-dimensional, such as a list of numbers, or multi-dimensional, such as a table of values.

Each element in an array is accessed using its index. In most programming languages like C, Java, and Python, the index starts from zero. This zero-based indexing means the first element is at position zero, the second at one, and so on. Accessing an element using its index is done in constant time, which gives arrays a significant performance advantage over other structures like linked lists.

However, arrays come with limitations. They have a fixed size, and inserting or deleting elements involves shifting other elements, which can be inefficient. Also, arrays do not inherently prevent the inclusion of duplicate values. This becomes problematic in applications that require unique entries.

Duplicates often arise due to unsorted data, lack of validation, or merging of datasets. For example, merging two lists of users may result in duplicate user IDs. Similarly, reading real-time data from sensors can lead to repeated values due to glitches or repeated measurements. In such cases, duplicate elements need to be removed to clean the data before further processing.

Introduction to Removing Duplicates Using a Temporary Array

The simplest and most intuitive way to remove duplicates from an array is by using a temporary array. This method works effectively for arrays with moderate sizes and when preserving the order of elements is necessary. The idea is to traverse the original array, check whether each element has been encountered before, and if not, store it in a new array. By the end of the process, the new array contains only unique elements.

This method is language-independent and can be implemented in any programming language that supports arrays and basic control structures like loops and conditionals. Below are implementations of this method in three widely used programming languages—C, Python, and Java.

Using a Temporary Array in C

C is a procedural language that provides low-level memory access. This makes it suitable for performance-sensitive applications but requires careful memory management. In C, we declare a temporary array with the same size as the original array. We then iterate through the original array and copy only those elements that are not duplicates. After the loop, the contents of the temporary array are copied back to the original array.

Here is the C implementation of this method:

c

CopyEdit

#include <stdio.h>

void removeDuplicates(int arr[], int *size) {

    if (*size <= 1) {

        return;

    }

    int temp[*size];

    int j = 0;

    for (int i = 0; i < *size – 1; i++) {

        if (arr[i] != arr[i + 1]) {

            temp[j++] = arr[i];

        }

    }

    temp[j++] = arr[*size – 1];

    for (int i = 0; i < j; i++) {

        arr[i] = temp[i];

    }

    *size = j;

}

int main() {

    int arr[] = {1, 2, 2, 3, 4, 4, 5};

    int size = sizeof(arr) / sizeof(arr[0]);

    printf(“Original Array: “);

    for (int i = 0; i < size; i++) {

        printf(“%d “, arr[i]);

    }

    removeDuplicates(arr, &size);

    printf(“\nArray after removing duplicates: “);

    for (int i = 0; i < size; i++) {

        printf(“%d “, arr[i]);

    }

    return 0;

}

In this example, the array is first scanned for consecutive duplicates. This means that the original array should be sorted beforehand. The function removeDuplicates performs in linear time after sorting, which itself takes O(n log n) time. The space complexity is O(n) due to the temporary array.

Using a Temporary Array in Python

Python offers a more readable and high-level way of working with arrays through lists. Lists in Python are dynamic and come with built-in methods to simplify many operations. The temporary array method in Python involves iterating through the list and appending unique items to a new list.

Here is the implementation in Python:

python

CopyEdit

def remove_duplicates_temp_array(arr):

    temp_array = []

    for item in arr:

        if item not in temp_array:

            temp_array.append(item)

    return temp_array

arr = [1, 2, 2, 3, 4, 4, 5]

print(f”Original Array: {arr}”)

print(f”Array after removing duplicates: {remove_duplicates_temp_array(arr)}”)

In this example, each element is checked against the temporary array using the in keyword. Although intuitive, this method has a time complexity of O(n^2) due to repeated membership checks. However, for small datasets, this tradeoff is acceptable. The advantage lies in its simplicity and preservation of order without requiring the array to be sorted.

Using a Temporary Array in Java

Java uses arrays and collections to manage data. One of the simplest ways to implement the temporary array method in Java is to use an ArrayList, which is a dynamic array structure. Java’s strong type-checking and object-oriented nature allow for clean and modular code.

Here is the Java implementation:

java

CopyEdit

import java.util.ArrayList;

import java.util.Arrays;

import java.util.List;

public class RemoveDuplicatesTempArray {

    public static List<Integer> removeDuplicates(int[] arr) {

        List<Integer> tempArray = new ArrayList<>();

        for (int item : arr) {

            if (!tempArray.contains(item)) {

                tempArray.add(item);

            }

        }

        return tempArray;

    }

    public static void main(String[] args) {

        int[] arr = {1, 2, 2, 3, 4, 4, 5};

        System.out.println(“Original Array: ” + Arrays.toString(arr));

        System.out.println(“Array after removing duplicates: ” + removeDuplicates(arr));

    }

}

This implementation uses the contains method of ArrayList to check for duplicates and appends only unique values. The time complexity is similar to Python’s approach, approximately O(n^2), due to the linear search performed by contains. However, the method maintains the insertion order and does not require sorting.

Using a temporary array is a simple and clear approach to removing duplicates. It does not alter the original order of elements and works across various programming languages. This method is especially useful when data integrity and readability are more important than performance. However, it may not scale well for very large datasets due to its O(n^2) time complexity in some implementations. In future sections, more efficient methods that trade off simplicity for performance will be explored.

Removing Duplicates Using Extra Space (Set or Hash Table)

As datasets grow larger and performance becomes more critical, methods that reduce time complexity are preferred. One such approach is using extra space, typically in the form of a set or hash table. These data structures offer fast lookup and insertion times, often in constant average time. This allows for more efficient duplicate detection and removal compared to the temporary array method.

A set is a data structure that stores only unique values. It automatically discards duplicates when new elements are added. Hash tables, or hash maps, associate each element with a unique key, which enables quick access and prevents duplication by design. Most modern programming languages offer built-in implementations of sets or similar structures, making them easy to use.

This section explains how to remove duplicates using sets in three popular languages: C++, Python, and Java.

Using a Set in C++

C++ provides the std::set container from the Standard Template Library (STL). A set in C++ automatically stores elements in sorted order and ensures uniqueness. Adding an element that already exists has no effect.

Here is an example using std::set in C++:

cpp

CopyEdit

#include <iostream>

#include <set>

#include <vector>

std::vector<int> removeDuplicates(const std::vector<int>& arr) {

    std::set<int> uniqueElements(arr.begin(), arr.end());

    return std::vector<int>(uniqueElements.begin(), uniqueElements.end());

}

int main() {

    std::vector<int> arr = {1, 2, 2, 3, 4, 4, 5};

    std::vector<int> result = removeDuplicates(arr);

    std::cout << “Original Array: “;

    for (int num : arr) {

        std::cout << num << ” “;

    }

    std::cout << “\nArray after removing duplicates: “;

    for (int num : result) {

        std::cout << num << ” “;

    }

    return 0;

}

This implementation constructs a set from the original vector. Since sets ignore duplicates, only unique values remain. The result is then copied into a new vector for output. The time complexity is O(n log n) because sets in C++ use balanced trees internally. The space complexity is O(n), proportional to the number of unique elements.

Using a Set in Python

Python’s set is a built-in data type that provides an unordered collection of unique elements. It supports efficient membership checks and element insertion. Python makes it very easy to convert a list to a set and then back to a list to remove duplicates.

Here is the Python implementation:

python

CopyEdit

def remove_duplicates_with_set(arr):

    return list(set(arr))

arr = [1, 2, 2, 3, 4, 4, 5]

print(f”Original Array: {arr}”)

print(f”Array after removing duplicates: {remove_duplicates_with_set(arr)}”)

This approach is compact and efficient. Converting the list to a set automatically filters out duplicates. The final list may not retain the original order because sets are unordered collections. For many applications, this is acceptable. The time complexity is approximately O(n), and the space complexity is also O(n), making this method highly efficient for large datasets.

If maintaining the original order is important, a set can still be used while iterating through the list:

python

CopyEdit

def remove_duplicates_preserve_order(arr):

    seen = set()

    result = []

    for item in arr:

        if item not in seen:

            seen.add(item)

            result.append(item)

    return result

This version combines the efficiency of a set with the requirement of order preservation.

Using a Set in Java

Java provides the HashSet class, part of the Java Collections Framework. A HashSet stores only unique elements and offers constant-time performance for the basic operations of add, remove, and contains. When converting from an array to a set and back, Java allows us to quickly eliminate duplicates.

Here is the Java implementation:

java

CopyEdit

import java.util.Arrays;

import java.util.HashSet;

import java.util.Set;

import java.util.List;

import java.util.ArrayList;

public class RemoveDuplicatesWithSet {

    public static List<Integer> removeDuplicates(int[] arr) {

        Set<Integer> set = new HashSet<>();

        for (int num : arr) {

            set.add(num);

        }

        return new ArrayList<>(set);

    }

    public static void main(String[] args) {

        int[] arr = {1, 2, 2, 3, 4, 4, 5};

        List<Integer> result = removeDuplicates(arr);

        System.out.println(“Original Array: ” + Arrays.toString(arr));

        System.out.println(“Array after removing duplicates: ” + result);

    }

}

This method creates a HashSet and adds each element from the array to it. Duplicates are automatically discarded. The final list is created from the set for ease of use. Like Python, Java’s set implementation does not preserve order. If order matters, LinkedHashSet can be used instead.

java

CopyEdit

import java.util.LinkedHashSet;

Set<Integer> set = new LinkedHashSet<>();

The use of LinkedHashSet ensures that elements retain their original insertion order while still enforcing uniqueness.

Removing Duplicates by Sorting the Array

Another practical and widely used approach to removing duplicate elements from an array involves sorting the array first. Once the elements are sorted, duplicates appear next to each other. This makes it easier to identify and skip repeated values while traversing the array just once.

This method is particularly useful in languages where built-in sort functions are efficient, and it works well when maintaining the relative order of original elements is not a requirement. Sorting simplifies the comparison process and reduces the number of operations needed to identify duplicates.

This section provides implementations of this method in C, Python, and Java.

Using Sorting in C

In C, arrays can be sorted using the standard library function qsort. After sorting, the array is scanned, and only the first occurrence of each unique element is retained.

Here is the C implementation using sorting:

c

CopyEdit

#include <stdio.h>

#include <stdlib.h>

int compare(const void* a, const void* b) {

    return (*(int*)a – *(int*)b);

}

void removeDuplicatesSorted(int arr[], int *size) {

    if (*size <= 1) return;

    qsort(arr, *size, sizeof(int), compare);

    int j = 0;

    for (int i = 1; i < *size; i++) {

        if (arr[i] != arr[j]) {

            j++;

            arr[j] = arr[i];

        }

    }

    *size = j + 1;

}

int main() {

    int arr[] = {4, 2, 1, 2, 3, 4, 5};

    int size = sizeof(arr) / sizeof(arr[0]);

    printf(“Original Array: “);

    for (int i = 0; i < size; i++) {

        printf(“%d “, arr[i]);

    }

    removeDuplicatesSorted(arr, &size);

    printf(“\nArray after removing duplicates: “);

    for (int i = 0; i < size; i++) {

        printf(“%d “, arr[i]);

    }

    return 0;

}

In this example, the array is sorted using qsort. Then, a single pass removes duplicates by comparing each element with the previous unique one. The time complexity is O(n log n) due to sorting, and the space complexity is O(1) since the operation is done in place.

Using Sorting in Python

Python provides a built-in sort method for lists, making sorting straightforward. After sorting, a loop is used to filter out duplicates by comparing each element to the previous one.

Here is the Python implementation:

python

CopyEdit

def remove_duplicates_by_sorting(arr):

    if not arr:

        return []

    arr.sort()

    result = [arr[0]]

    for i in range(1, len(arr)):

        if arr[i] != arr[i – 1]:

            result.append(arr[i])

    return result

arr = [4, 2, 1, 2, 3, 4, 5]

print(f”Original Array: {arr}”)

print(f”Array after removing duplicates: {remove_duplicates_by_sorting(arr)}”)

In this method, the original list is sorted in place. Then, a new list is built by adding only elements that are different from the one before them. This method has a time complexity of O(n log n) and a space complexity of O(n) due to the result list.

Using Sorting in Java

Java provides the Arrays.sort() method, which sorts arrays efficiently. After sorting, the array can be traversed to filter out duplicates.

Here is the Java implementation:

java

CopyEdit

import java.util.Arrays;

public class RemoveDuplicatesBySorting {

    public static int[] removeDuplicates(int[] arr) {

        if (arr.length == 0) return new int[0];

        Arrays.sort(arr);

        int j = 0;

        for (int i = 1; i < arr.length; i++) {

            if (arr[i] != arr[j]) {

                j++;

                arr[j] = arr[i];

            }

        }

        return Arrays.copyOfRange(arr, 0, j + 1);

    }

    public static void main(String[] args) {

        int[] arr = {4, 2, 1, 2, 3, 4, 5};

        System.out.println(“Original Array: ” + Arrays.toString(arr));

        int[] result = removeDuplicates(arr);

        System.out.println(“Array after removing duplicates: ” + Arrays.toString(result));

    }

}

This implementation uses sorting to bring duplicates together. A loop then places unique elements at the front of the array. The resulting array is sliced to include only those elements. The time complexity is O(n log n), and the space complexity is O(n) due to the new array being returned.

Sorting-Based Method

Removing duplicates by first sorting the array is an effective method when performance and simplicity are both important. Sorting groups duplicate values together, making them easy to identify and eliminate in a single pass.

This approach does not require additional data structures like sets or hash maps. However, it changes the order of elements, which may not be acceptable in all situations. Sorting can also add extra time complexity, though it is acceptable for medium to large-sized datasets.

This method strikes a good balance between readability, performance, and broad language compatibility.

Removing Duplicates In-Place Without Extra Space

In situations where memory is limited or optimal space usage is crucial, removing duplicates in-place becomes the preferred strategy. The in-place method modifies the original array directly without allocating additional space for temporary arrays or data structures. This technique is especially useful for embedded systems or performance-sensitive environments.

The core idea is to shift unique elements toward the beginning of the array while maintaining a pointer or index to track where the next unique element should be placed. This method works best when the array is sorted, as it relies on detecting duplicates based on adjacent elements.

This section demonstrates how to implement in-place duplicate removal in C, Python, and Java.

In-Place Removal in C

C provides low-level memory access and allows direct manipulation of array elements. In this implementation, a single pass through the array identifies unique elements, which are then placed at the start of the array.

Here is the in-place method in C:

c

CopyEdit

#include <stdio.h>

#include <stdlib.h>

int compare(const void *a, const void *b) {

    return (*(int *)a – *(int *)b);

}

int removeDuplicatesInPlace(int arr[], int size) {

    if (size == 0 || size == 1) return size;

    qsort(arr, size, sizeof(int), compare);

    int j = 0;

    for (int i = 1; i < size; i++) {

        if (arr[i] != arr[j]) {

            j++;

            arr[j] = arr[i];

        }

    }

    return j + 1;

}

int main() {

    int arr[] = {4, 2, 1, 2, 3, 4, 5};

    int size = sizeof(arr) / sizeof(arr[0]);

    int newSize = removeDuplicatesInPlace(arr, size);

    printf(“Array after removing duplicates: “);

    for (int i = 0; i < newSize; i++) {

        printf(“%d “, arr[i]);

    }

    return 0;

}

This approach first sorts the array using qsort, then processes it in-place to move unique elements forward. The final array contains only unique values at the beginning, and the function returns the new size. Time complexity is O(n log n), and space complexity is O(1) beyond sorting.

In-Place Removal in Python

Python lists are mutable, but removing elements in-place requires careful handling of indices. One way to simulate in-place behavior is by overwriting values and slicing the final result.

Here is the Python implementation:

python

CopyEdit

def remove_duplicates_in_place(arr):

    if not arr:

        return []

    arr.sort()

    j = 0

    for i in range(1, len(arr)):

        if arr[i] != arr[j]:

            j += 1

            arr[j] = arr[i]

    return arr[:j+1]

arr = [4, 2, 1, 2, 3, 4, 5]

print(f”Original Array: {arr}”)

print(f”Array after removing duplicates: {remove_duplicates_in_place(arr)}”)

This method sorts the list and then performs a single traversal. It keeps track of the position for the next unique element and rewrites the array as it progresses. The result is returned as a sliced view of the original list. Time complexity is O(n log n), and no additional space is used beyond the input list.

In-Place Removal in Java

In Java, arrays have fixed sizes, so modifying them in-place means rearranging the elements and tracking the valid portion of the array. This method sorts the array and then copies unique elements to the front.

Here is the Java implementation:

java

CopyEdit

import java.util.Arrays;

public class RemoveDuplicatesInPlace {

    public static int removeDuplicates(int[] arr) {

        if (arr.length == 0) return 0;

        Arrays.sort(arr);

        int j = 0;

        for (int i = 1; i < arr.length; i++) {

            if (arr[i] != arr[j]) {

                j++;

                arr[j] = arr[i];

            }

        }

        return j + 1;

    }

    public static void main(String[] args) {

        int[] arr = {4, 2, 1, 2, 3, 4, 5};

        int newSize = removeDuplicates(arr);

        System.out.print(“Array after removing duplicates: “);

        for (int i = 0; i < newSize; i++) {

            System.out.print(arr[i] + ” “);

        }

    }

}

This implementation mirrors the logic of the C and Python versions. The array is sorted, then scanned for unique values. Elements are overwritten as needed, and the final size is returned. The time complexity is O(n log n), and space usage is constant.

Final thoughts 

The in-place method is ideal when conserving memory is a priority. It avoids creating new arrays or data structures, working directly with the original data. While it requires sorting the array first, the savings in space make it suitable for applications where memory allocation is constrained.

However, this method modifies the original array, which may not be acceptable in all scenarios. It also does not preserve the original order unless the array was already sorted. Despite these trade-offs, in-place removal offers a clean and efficient solution when space efficiency is critical.