23k 23k - 7 months ago 9
Java Question

Writing a remove method for a class implementing Iterable

So I have a class implementing Iterable to write a set of methods for. Most of them are pretty straightforward to think through, however, I'm having trouble writing a remove method for the class.

import java.util.Iterator;

public class Bag<Item> implements Iterable<Item> {

private Item[] data = (Item[]) new Object[5];

// The size variable keeps track of how many items
private int size = 0;

public String toString() {
StringBuilder b = new StringBuilder("[");
for (Item i : this)
b.append(i + " ");
return b.toString() + "]";
}

public void expandArray() {
int capacity = data.length * 2;
Item[] newData = (Item[]) new Object[capacity];
for (int i = 0; i < data.length; i++)
newData[i] = data[i];
data = newData;
}

public boolean add(Item x) {
if (size == data.length)
expandArray();
data[size++] = x;
return true;
}

// return an Iterator for the bag
public Iterator<Item> iterator() {
return new BagIterator<Item>();
}

// Iterator class
public class BagIterator<Item> implements Iterator<Item> {

private int i = 0;

public boolean hasNext() {
return i < size;
}

public Item next() {
return (Item) data[i++];
}

}

public boolean contains(Item x) {
for (int i = 0; i < data.length; i++) {
if (data[i] == x)
return true;
}
return false;
}

public boolean addUnique(Item x) {
for (int i = 0; i < data.length; i++) {
if (data[i] == x)
return false;
}
this.size++;
this.add(x);
return true;
}

public boolean remove(Item x) {

Item lastItem = x; // holds x item
Item swap; // holds item to swap
int swapIndex; // holds index of item to swap

for (int i = 0; i < data.length; i++) {
if (data[i] == x) {
// Save the last item
lastItem = data[3];
// Save the swapped item
swap = data[i];
// Save the index of swapped item
swapIndex = i;
// move swap item to end of list
data[3] = swap;
// move last item to swap pos
data[swapIndex] = lastItem;
// remove last item in list
this.size--;
return true;
}
}
return false;
}

public boolean equals(Object o) {
Bag<Item> b = (Bag<Item>) o;
return false;
}
}


My thoughts behind the remove method are as follows; run through the Bag, find the item to remove, take that same item and move it to the end of the bag (swapping its place with the last item in the bag), then decrease the size of the bag (thinking that it would remove it).

Now clearly there are some issues with my thinking. 1) The bag is still its original size. 2) The Bag is now unordered, which will cause a problem later on when comparing two bags.

So my question is, how can I effectively write a remove method to take an item out of my bag class without running into the issues I previously mentioned.

Main

public class Main {

public static void main (String[] args) {

Bag<Integer> bag = new Bag<>();

bag.add(1);
bag.add(2);
bag.add(3);
bag.add(4);

System.out.println(bag); // [1, 2, 3, 4]

System.out.println(bag.remove(4)); // should remove 4 and return true **WORKING
System.out.println(bag.remove(1)); // should remove 1 and return true **WORKING
System.out.println(bag.remove(1)); // should NOT remove 1 and return false **NOT WORKING

System.out.println(bag); // [4 ]

}
}

Answer

Somehow I totally misread your question the first time around; I see now you're not even implementing Iterator.remove(). Iterable.remove() is also tricky, let's talk about that!

First off, you probably don't want to implement Iterable directly. It only lets you iterate over a sequence of elements (and optionally call Iterator.remove()), nothing else. Your Bag class should instead probably implement Collection (or extend AbstractCollection) which is the general interface most Java data structures implement (it specifies .add(), .remove(), .size(), and so on).

The key thing to remember when creating a data structure, such as a bag, is to enforce your invariants. Roughly speaking an invariant is a guarantee about how your data structure or method will behave. For example, every time you call .size() on an empty data structure it will return 0. Once you call .add(), all future calls to .size() will return 1 (until you modify the data structure further). That's an invariant. It might seem obvious, but as your data structures get more complex these sort of simple guarantees make reasoning about your code much simpler.

Onto to your specific question. First, your idea to move the item to remove to the end is a good intuition. It's much more efficient than copying the remaining elements into new indices.

Your first concern, that the Bag is still the same size, is actually not a problem. Since you decrement size the item at the end of the array is - effectively - removed. You could set it to null to remove the reference but you don't need to. Instead you should look at your other methods, such as contains(), and make sure that they respect size. You should only look at indicies smaller than size, rather than data.length, because the values inbetween size and data.length may be garbage.

Your second concern, about comparing two Bags, exposes another issue. Your class doesn't actually provide an invariant (there's that word again) that the array will ever be ordered, so re-ordering it during .remove() doesn't make things any worse than they were beforehand. What is a problem is your class doesn't override .equals() and .hashcode() (you have to do neither or both), meaning that no two Bag instances can ever be considered equivalent, regardless of the order of their elements. If you intend for Bags to be compared you need to implement those methods correctly. Assuming you want to do that, how is definitely tricky. Generally speaking there is no efficient way to compare two unordered collections of objects - your only choice is to iterate over all the elements of both collections until you've verified they contain the same elements. This is O(n^2) or quadratic in performance (i.e. pretty slow).

You have two basic choices; ensure your backing array is always sorted (that's an invariant - at the end of every method the array will be sorted) or use a more efficient data structure for equality checks such as a Set. Both options have tradeoffs. Correctly ensuring an array is always sorted is very tricky; Java's TreeMap does this and most people consider this the type of code they'd never ever want to re-implement themselves. Using a Set lets you efficiently check if an element exists in your collection, but comes at the cost of preventing duplicate elements. A "bag" data structure generally permits duplicates, so using a Set as a backing data structure may not be acceptable for your use case. Using a Map<E, Integer> where the value is a count would work, but is a little more paperwork to keep track of.

Nevertheless, as a starting point a standard brute force .equals() implementation may be good enough. A central tenant of good software engineering is to avoid over-optimization. Start with something that works and then make it better, rather than trying to make something perfectly efficient from the get-go.

Comments