Don’t you hate when the computer does exactly what you told it do versus what you think you told it to? I’ve been burned in the past with lazy collection/stream/iterator operator evaluations in the past. Specifically I am burned by the fact that Dart doesn’t force you to explicitly ask for a new iterator. It is a common thing to be careful about but it burned me in the refactoring of my Twitter Account Shredder just now. Even though it only burned up a couple minutes of time I figured I’d re-document for posterity here for others. I had another more detailed post on the topic is here earlier this year.
Lazy evaluation is what gives streams and iterators their power. Rather than working in bulk on an entire large array of objects it can evaluate them when it is appropriate to do so. This means that if you have a multi-GB dataset there is no need to load the entire thing into memory and bulk operator on it. The code may read like that is what happens. In fact it just processes it element-wise. This power is also what gives us the ability to operate on open ended streams in an identical way.
For the problem I was trying to solve I’m pulling the top IDs from Twitter to create an ID stream for consumption by a delete operation. It gets a certain number of Tweet/Like/Following IDs with each call which fills a queue that is then processed. When the queue is empty it goes back to Twitter to see if there are more IDs. That’s the way it works in a delete operation. However if it is running and the deletes are not happening I don’t want it to keep refilling with the same ID list. The stream should be empty if it has served all of the top IDs. I therefore decided to keep track of previously seen IDs in a hash set. The idea is that it’ll get new IDs, see if it has been seen before and only return new ones. If there are no new ones then it returns nothing. The code snippet of the original coding is here:
Future<List<String>> _getMoreIds() async {
final newIds = <String>[];
switch(type) {
case TwitterClientIDStreamType.following:
newIds.addAll(await _getFollowers());
break;
case TwitterClientIDStreamType.likes:
newIds.addAll(await _getLikes());
break;
case TwitterClientIDStreamType.statuses:
newIds.addAll(await _getTweets());
break;
}
final idsToReturn = newIds.where((id) => !_seenIds.contains(id));
_seenIds.addAll(idsToReturn);
return idsToReturn.toList();
Looks good. So I run it in the tests and it unfortunately always returns an empty list. Why would that be? It all goes back to the last three lines:
final idsToReturn = newIds.where((id) => !_seenIds.contains(id));
_seenIds.addAll(idsToReturn);
return idsToReturn.toList();
How was I interpreting that not thinking about lazy evaluation:
- Create a collection of subsets of the newIds that are not in the seenIds set
- Add the remaining IDs to the seenIds set so that on the next pass they will get filtered
- Return the collection in
List
form.
What is actually happening is:
- Create an Iterable from the original collection which will apply the
where
clause at invocation time - Process the idsToReturn iterator to add all surviving elements
- Process the idsToReturn iterator and return the surviving elements as a list
It is the nature of the iterators that they are not persistent. So if they are called multiple times they start from scratch again. So the first time through it evaluated returning any of the new IDs. Those were then added to the seenIds set. When I asked to return a list the iterator ran again. This time however all those “new” IDs are in the seenIds set and are filtered out. Therefore this function will always return an empty list.
Some languages have syntax to prevent this sort of mistake. Dart has no such mechanism so that’s why I keep on getting bitten by it.
The code that does what I want it to is:
final idsToReturn = newIds.where((id) => !_seenIds.contains(id)).toList();
seenIds.addAll(idsToReturn);
return idsToReturn;
Which does exactly what I thought I meant to do in the first place.