Group collections by their elements property

  • I have a Customer class:



    public class Customer
    {
    public Guid Id { get; set; }
    // Some other properties...}


    And three transactions classes that have a reference to Customer:



    public class Order
    {
    public Guid CustomerId { get; set; }
    // Some other properties...}

    public class Invoice
    {
    public Guid CustomerId { get; set; }
    // Some other properties...}

    public class Payment
    {
    public Guid CustomerId { get; set; }
    // Some other properties...}


    I have a collection for each of transaction type. And I want to get this collection to be "grouped" by their elements CustomerId properties. So, as a result I'd like to get a collection of such objects:



    public class CustomerInfo
    {
    public CustomerInfo(Guid customerId, IEnumerable<Order> orders,
    IEnumerable<Invoice> invoices, IEnumerable<Payment> payments){...}

    public Guid CustomerId { get; set; }
    public IEnumerable<Invoice> Invoices { get; set; }
    public IEnumerable<Order> Orders { get; set; }
    public IEnumerable<Payment> Payments { get; set; }
    }


    Right now I am doing so by this function:



    private IEnumerable<CustomerInfo> _GetCustomerInfo(
    IEnumerable<Payment> payments, IEnumerable<Invoice> invoices,
    IEnumerable<Order> orders)
    {
    var invoicesGroupdByCustomers = invoices.GroupBy(x => x.CustomerId);
    var ordersGroupdByCustomers = orders.GroupBy(x => x.CustomerId);
    var paymentsGroupdByCustomers = payments.GroupBy(x => x.CustomerId);

    var result = new List<CustomerInfo>();
    foreach (var group in invoicesGroupdByCustomers)
    result.Add(new CustomerInfo(group.Key,
    ordersGroupdByCustomers.FirstOrDefault(x => x.Key == group.Key),
    group,
    paymentsGroupdByCustomers.FirstOrDefault(x => x.Key == group.Key)));

    foreach (var group in ordersGroupdByCustomers)
    if (!result.Any(x => x.CustomerId == group.Key))
    result.Add(new CustomerInfo(group.Key,
    group,
    invoicesGroupdByCustomers.FirstOrDefault(x => x.Key == group.Key),
    paymentsGroupdByCustomers.FirstOrDefault(x => x.Key == group.Key)));

    foreach (var group in paymentsGroupdByCustomers)
    if (!result.Any(x => x.CustomerId == group.Key))
    result.Add(new CustomerInfo(group.Key,
    ordersGroupdByCustomers.FirstOrDefault(x => x.Key == group.Key),
    invoicesGroupdByCustomers.FirstOrDefault(x => x.Key == group.Key),
    group));

    return result;
    }


    Here is sample of calculations where I use these transactions collections:



    private Dictionary<Guid, decimal> _CalculateSales(
    IEnumerable<CustomerInfo> customersSalesInfo)
    {
    var result = new Dictionary<Guid, decimal>();

    foreach (var customerInfo in customersSalesInfo)
    {
    var sales = customerInfo.Invoices.Sum(x => x.Header.SubTotalAmt) +
    customerInfo.Orders.Sum(x => x.Header.SubTotalAmt) -
    customerInfo.Payments.Sum(x => x.Header.SubTotalAmt);
    result.Add(customerInfo.CustomerId, sales);
    }

    return result;
    }


    Is there a better approach?


  • almaz

    almaz Correct answer

    8 years ago

    If you're developing a production solution I would rather think how to avoid loading all the customers/invoices/etc at first place. If that's the case then please describe the use cases in which Customerinfo is going to be used, how do you get invoices/orders/payments collections, and is there a chance to leverage ORM capabilities (assuming that you get the data via some sort of ORM) to load these collections for customers?



    But if we are just talking about programming exercise then I would probably go with the following solution (influenced by RavenDB's map/reduce approach):



    IEnumerable<CustomerInfo> infos = (
    from invoice in invoices
    select new { invoice.CustomerId, Invoice = invoice, Order = null, Payment = null }
    ).Concat(
    from order in orders
    select new { order.CustomerId, Invoice = null, Order = order, Payment = null }
    ).Concat(
    from payment in payments
    select new { payment.CustomerId, Invoice = null, Order = null, Payment = payment }
    ).GroupBy(x=>x.CustomerId, (key, group) => new CustomerInfo(key,
    group.Select(x => x.Invoice).Where(i => i != null),
    group.Select(x => x.Order).Where(o => o != null),
    group.Select(x => x.Payment).Where(p => p != null));


    I can't test this code right now, you might have to explicitly specify types in anonymous objects.



    Update:
    Based on your usage of CustomerInfo you might not actually need it at all (if that's all you do with it) :). I would rather try to reduce the amount of data straight away, thus improving performance and reducing memory usage:



    private Dictionary<Guid, decimal> _CalculateSales(
    IEnumerable<Payment> payments, IEnumerable<Invoice> invoices,
    IEnumerable<Order> orders)
    {
    var result = (
    from invoice in invoices
    select new { invoice.CustomerId, Amount = invoice.Header.SubTotalAmt}
    ).Concat(
    from order in orders
    select new { order.CustomerId, Amount = order.Header.SubTotalAmt}
    ).Concat(
    from payment in payments
    select new { payment.CustomerId, Amount = -payment.Header.SubTotalAmt}
    ).GroupBy(x => x.CustomerId)
    .ToDictionary(g => g.Key, g => g.Sum(x => x.Amount));
    return result;
    }


    The benefit of this solution comparing to original code is that it iterates over collections only once, and the only Dictionary created is the actual result.


    I really need to load all transactions. The task is to calculate sales for all customer for some period of time. So, first I'd get list of all transactions for this time, then to group them by customer, then calculate sales for each customer. Thanks for nice code.

    In this case I would rather use DB capabilities to groups transactions by CustomerId, unless you are doing some complex calculations there.

    Can you provide a sample of calculations you're doing?

    problem is that I am getting transaction not from DB, but from really dumb web service, it can't do grouping. I've updated question with a sample of calculations.

    I've updated my response according to your update

    Thanks a lot for such solution. As soon as number of transaction can be really big, it can save a lot of time. I'll compare it with svick's code.

License under CC-BY-SA with attribution


Content dated before 7/24/2021 11:53 AM

Tags used