Tono Nam Tono Nam - 1 month ago 5
C# Question

Join tables that contain no relationship

I have the following database:

enter image description here


  1. Table Messages:


    • Responsible for storing messages sent do different email addresses.


  2. Table Emails:


    • Only responsible for storing email addresses (primary key = Email)
      (toEmail has a relationship with table emails) (Sotres Foo@gmail.com, Foo2@gmail.com etc...)


  3. Table Contacts:


    • Used to store user's contacts. (stores mark, John, Tom, etc...)


  4. Table ContactEmails


    • Need this table because a contact may have multiple email addresses. (Stores the ids found in table Emails)




Anyways here is the question:
I want to create a query that will select all the messages that are sent during specific dates. I want to include the contact name in the query if it exists. I have crated the following query but it is to slow:

Func<string, Contact> tryGetContact = (email)=>{
var contactEmail = db.ContactEmails.FirstOrDefault(x=>x.IdEmail==email);
if(contactEmail==null)
return null;
return contactEmail.Contact; // navigational property created by entity framework.
};

var query = from msg in db.Messages
join email in db.Emails on msg.ToEmail equals email.Email
where msg.Date < "some date" && msg.Date > "some other date"
select new
{
MessageSubject = msg.Subject,
ToEmail = email.Email,
Contact = tryGetContact(email.Email) // this slows down the query!
};


In order to make my query execute faster do I need to store all the contacts in a dictionary and separate this query into 2 queries?

Storing all the contacts in a dictionary will make things much more efficient. But retriving all the contacts from the database where I do not need most of them makes me feel I am wasting resources.

Answer

It would be easier to determine what is slowing down the query if we could see the actual SQL generated by your Linq query. However, I guess that it might have something to do with your tryGetContact Func that is not sharing the same context as the main part of the query.

So if I am right, every time you call tryGetContact(email.Email) a new complete query will be executed because of this line:

var contactEmail = db.ContactEmails.FirstOrDefault(x=>x.IdEmail==email);

In that case the db.ContactEmails is not part of the join in the SQL query so it is reexecuted each time.

So what I would do is to add another join to include the ContactEmails on the Linq (and subsequent SQL) query. This should looks like that:

var query = from msg in db.Messages
            join email in db.Emails on msg.ToEmail equals email.Email
            join contactEmail in db.ContactEmails on contactEmail.IdEmail equals email.Email
            where msg.Date < "some date" && msg.Date > "some other date" 
            select new
            {
                MessageSubject = msg.Subject,
                ToEmail = email.Email,
                Contact = (contactEmail==null) ? contactEmail.Contact : null,
            };

If this doesn't work you might want to execute the db.ContactEmails only once for all your tryGetContact calls and store the result in a Collection (or HashSet for better performance).