Shaan Singh Shaan Singh - 4 months ago 90
Swift Question

CloudKit performance using a CKReference vs. CKRecord

Let's say I have a

CKRecord
of
recordType
Post. Post hold a few values, like title and description. When a Post is displayed in the app, it is accompanied by the name and profile picture of the user who wrote it (let's call them the Writer). My question is – would it better to store a
CKReference
to the Writer's Profile (Profile is another type of record that holds Writer details), or it would it be better to directly add the Writer's details to the Post when they write it?

The first option makes perfect sense from a database schema perspective, but it seems really bad from a performance perspective. With thousands of users on this system, the amount of fetches and time to load them all seems unreasonable.

The first part involves loading all Posts.

func loadPosts() {
// ...Setup the query
publicData.performQuery(query, inZoneWithID: nil) { (results: [CKRecord]?, error: NSError?) in
if let posts = results {
self.loadProfiles(posts)
}
}
}


One query done, now we called
loadProfiles


func loadProfiles(posts: [CKRecord]) {
// Get the reference IDs out of the Posts
var referenceIDs = [CKRecordID]()
for post in posts {
// Get the reference from the post
// Append the recordID to the referenceIDs array
}

// Perform the Profiles fetch
let fetchOperation = CKFetchRecordsOperation(recordIDs: referenceIDs)
fetchOperation.fetchRecordsCompletionBlock = { records, error in
// ...Handle the fetched Profiles

// Everything has been fetched, update the UI now
dispatch_async(dispatch_get_main_queue(), {
self.tableView.reloadData()
})
}
CKContainer.defaultContainer().publicCloudDatabase.addOperation(fetchOperation)
}


In that function, we spent time grabbing the referenceIDs. We then spent time doing the Profile fetch. Mind that all of this is happening after the original Post fetch!

...Yikes. Even with some sort of caching system, the original fetch would be crazy (especially with lots of users).

So, would it be better to directly add the Writer's details to the Post when they write it? Pros of this: less fetches, faster loading. Cons of this: If the Writer ever changes their Profile details, the app will have to loop through all of their Posts and manually update the details.

This whole dilemma reeks of a pick your poison scenario. Is there a better way to do this?

Answer

The term for the tradeoff you're describing is denormalization. It is the dilemma you describe. Weighing the tradeoffs depends both on the underlying technology and on the application domain and expected behavior.

You've described two model objects, a Post and a Profile, with a Writer reference to a Profile on a Post. You haven't described exactly how they're used, but I'm going to assume as scrolling list of posts, in a table view, with the writer name and profile picture on each cell in the list. It's obvious why you're concerned about pulling references for each one.

CloudKit's priority is minimizing the number of round trips to the server. However, one fetch for posts and a second fetch for the linked profile names isn't particularly onerous. Very important: use the desiredKeys property on fetches and queries here and wheneven you can. CloudKit fetches the entire record by default, and you probably would be passing extraneous information over the wire-- it's the difference between getting the users' names and getting the users' complete profiles.

The point to use desiredKeys wherever possible is not hammered home nearly enough in the documentation relative to its importance for optimization, and simplicity to implement.

But if you're concerned about being responsive when pulling posts, for example if the user is going to be waiting while you pull more, you might want to denormalize.

I'm also going to assume a Profile doesn't change very often -- this is a key important point -- but it can change and the app needs to account for that. It's actually pretty simple: Looping through the posts to update them isn't ideal, but it's not that big a deal, because it's a one-off thing that you don't expect to happen very often. You should be able to do it with a pair of CKQueryOperation/CKModifyRecordsOperation.

Again, be sure to use desiredKeys -- especially if you're just fetching to update the denormalized fields on every post and don't plan to display any of it, you don't want to pass the full contents of every post over the wire.

Note if you denormalize with profile pictures, you will probably want to make sure you are using the same CKAsset, so you get built in caching, and don't accidentally send the same image up and down and stored a bunch of times. See caveats about how CKAsset and local caching works; apparently if you want it guaranteed to be stored locally you have to cache it yourself.

And that's a good point to note that all these CloudKit data types are specifically not supposed to be used as the model objects in your app, and how this will interact with that layer will affect whatever choices you make.

CloudKit is actually great but in my opinion is held back by it's unsystematic documentation. I was fortunate enough to go to WWDC this year ('16) and was able to talk to some CloudKit engineers, which is where some of this info comes from. Hope this helps.