RLH RLH - 21 days ago 5
Python Question

How can I create two unique, queriable fields for a GAE Datastore Data Model?

First a little setup. Last week I was having trouble implementing a specific methodology that I had constructed which would allow me to manage two unique fields associated with one db.Model object. Since this isn't possible, I created a parent entity class and a child entity class, each having the key_name assigned one of the unique values. You can find my previous question located here, which includes my sample code and a general explaination of my insertion process.

On my original question, someone commented that my solution would not solve my problem of needing two unique fields associated with one db.Model object.

My implementation tried to solve this problem by implementing a static method that creates a ParentEntity and it's key_name property is assigned to one of my unique values. In step two of my process I create a child entity and assign the parent entity to the parent parameter. Both of these steps are executed within a db transaction so I assumed that this would force the uniqueness contraint to work since both of my values were stored within two, separate key_name fields across two separate models.

The commenter pointed out that this solution would not work because when you set a parent to a child entity, the key_name is no longer unique across the entire model but, instead, is unique across the parent-child entries. Bummer...

I believe that I could solve this new problem by changing how these two models are associated with one another.

First, I create a parent object as mentioned above. Next, I create a child entity and assign my second, unique value to it's key_name. The difference is that the second entity has a reference property to the parent model. My first entity is assigned to the reference property but not to the parent parameter. This does not force a one-to-one reference but it does keep both of my values unique and I can manage the one-to-one nature of these objects so long as I can control the insertion process from within a transaction.

This new solution is still problematic. According to the GAE Datastore documentation you can not execute multiple db updates in one transaction if the various entities within the update are not of the same entity group. Since I no longer make my first entity a parent of the second, they are no longer part of the same entity group and can not be inserted within the same transaction.

I'm back to square one. What can I do to solve this problem? Specifically, what can I do to enforce two, unique values associated with one Model entity. As you can see, I am willing to get a bit creative. Can this be done? I know this will involve an out-of-the-box solution but there has to be a way.

Below is my original code from my question I posted last week. I've added a few comments and code changes to implement my second attempt at solving this problem.

class ParentEntity(db.Model):
str1_key = db.StringProperty()
str2 = db.StringProperty()

@staticmethod
def InsertData(string1, string2, string3):
try:
def txn():
#create first entity
prt = ParentEntity(
key_name=string1,
str1_key=string1,
str2=string2)
prt.put()

#create User Account Entity
child = ChildEntity(
key_name=string2,
#parent=prt, #My prt object was previously the parent of child
parentEnt=prt,
str1=string1,
str2_key=string2,
str3=string3,)
child.put()
return child
#This should give me an error, b/c these two entities are no longer in the same entity group. :(
db.run_in_transaction(txn)
except Exception, e:
raise e

class ChildEntity(db.Model):
#foreign and primary key values
str1 = db.StringProperty()
str2_key = db.StringProperty()

#This is no longer a "parent" but a reference
parentEnt = db.ReferenceProperty(reference_class=ParentEntity)
#pertinent data below
str3 = db.StringProperty()

RLH RLH
Answer

After scratching my head a bit, last night I decided to go with the following solution. I would assume that this still provides a bit of undesirable overhead for many scenarios, however, I think the overhead may be acceptable for my needs.

The code posted below is a further modification of the code in my question. Most notably, I've created another Model class, called named EGEnforcer (which stands for Entity Group Enforcer.)

The idea is simple. If a transaction can only update multiple records if they are associated with one entity group, I must find a way to associate each of my records that contains my unique values with the same entity group.

To do this, I create an EGEnforcer entry when the application initially starts. Then, when the need arises to make a new entry into my models, I query the EGEnforcer for the record associated with my paired models. After I get my EGEnforcer record, I make it the parent of both records. Viola! My data is now all associated with the same entity group.

Since the *key_name* parameter is unique only across the parent-key_name groups, this should inforce my uniqueness constraints because all of my FirstEntity (previously ParentEntity) entries will have the same parent. Likewise, my SecondEntity (previously ChildEntity) should also have a unique value stored as the key_name because the parent is also always the same.

Since both entities also have the same parent, I can execute these entries within the same transaction. If one fails, they all fail.

#My new class containing unique entries for each pair of models associated within one another.
class EGEnforcer(db.Model): 
KEY_NAME_EXAMPLE = 'arbitrary unique value'

    @staticmethod
    setup():
        ''' This only needs to be called once for the lifetime of the application. setup() inserts a record into EGEnforcer that will be used as a parent for FirstEntity and SecondEntity entries.  '''
        ege = EGEnforcer.get_or_insert(EGEnforcer.KEY_NAME_EXAMPLE)
    return ege

class FirstEntity(db.Model):
    str1_key =  db.StringProperty()
    str2 =      db.StringProperty()

    @staticmethod
    def InsertData(string1, string2, string3):
        try:
            def txn():
                ege = EGEnforcer.get_by_key_name(EGEnforcer.KEY_NAME_EXAMPLE)
                prt = FirstEntity(
                    key_name=string1, 
                    parent=ege) #Our EGEnforcer record.
                prt.put()

                child = SecondEntity(
                    key_name=string2, 
                    parent=ege, #Our EGEnforcer record.
                    parentEnt=prt,
                    str1=string1, 
                    str2_key=string2,
                    str3=string3)
                child.put()
                return child
        #This works because our entities are now part of the same entity group
            db.run_in_transaction(txn)
        except Exception, e:
            raise e

class SecondEntity(db.Model):
    #foreign and primary key values
    str1 =      db.StringProperty()
    str2_key =  db.StringProperty()

    #This is no longer a "parent" but a reference
    parentEnt = db.ReferenceProperty(reference_class=ParentEntity)

#Other data...
    str3 =      db.StringProperty()

One quick note-- Nick Johnson pinned my need for this solution:

This solution may be sufficient to your needs - for instance, if you need to enforce that every user has a unique email address, but this is not your primary identifier for a user, you can insert a record into an 'emails' table first, then if that succeeds, insert your primary record.

This is exactly what I need but my solution is, obviously, a bit different than your suggestion. My method allows for the transaction to completely occur or completely fail. Specifically, when a user creates an account, they first login to their Google account. Next, they are forced to the account creation page if there is no entry associated with their Google account in SecondEntity (which is actually UserAccount form my actual scenario.) If the insertion process fails, they are redirected to the creation page with the reason for this failure.

This could be because their ID is not unique or, potentially, a transactional timeout. If there is a timeout on the insertion of their new user account, I will want to know about it but I will implement some form of checks-and-balance in the near future. For now I simply want to go live, but this uniqueness constraint is an absolute necessity.

Being that my approach is strictly for account creation, and my user account data will not change once created, I believe that this should work and scale well for quite a while. I'm open for comments if this is incorrect.