
Comments and Discussions



Well your second solution is not "always" correct.
Assume that you have a list of 1 element. Obviously, even if there is only one element, checking if "all the numbers in the list are the same" should be true. But your last check will fail since query1.Count() is equal to 1.
modified 18May12 15:47pm.





When there is only one element, there is not difference in the list. The code looks good to me.
Erol





haaa... you're right. my bad
That's what happens when someone is talking to you about annoying things that you don't want to hear about.





Maybe you can fix the vote of 3?
Erol





A solution that is robust regarding double rounding problems and that is O(n) could be (assuming a double tolerance of 1E5):
...
bool isSame = Math.Abs(myList.Average()  myList.Min()) < 1E5;
...
Cheers
Andi





Ew, you're going to iterate the whole List twice?





Yes. Comparing the different approaches to check if all values are equal, leads to quite interesting figures (list size = 1000/100000, time in milliseconds)
Algorithm  all data equal  all data random  bestcase  worstcase 
AverageMin  22/2104  22/2145  linear = O(n)  linear = O(n) 
Distinct.Count  36/3221  89/12676  about linear = O(n)  about O(n*log(n)) 
Loop of Loop  5435/...  0/0  immediate  O(n*n) 
HashSet  40/3764  53/7406  linear = O(n)  linear = O(n) 
The suggested approach gives guaranteed a result in linear time.
[EDIT]
The HashSet approach (as you did suggest in your alternative tip) also seems to be garanteed linear. The problem with a double HashSet is that it is useless since double hash code and equality is unreliable (try adding (1.0/6.0 + 1.0/6.0 + 1.0/6.0 + 1.0/6.0 + 1.0/6.0 + 1.0/6.0) and 1.0 > you get two values (which is incorrect in my eyes)).
[/EDIT]
Cheers
Andi
modified 15May12 17:20pm.





Interesting. Thanks for doing that, Andi.
Erol
Erol





Yes, well I would stick with the basic shortcircuited loop and a test.
What is this "Loop of Loop"?
Where's the test code?





This is the test hack. Please observe that due to the O(n^2) nature of the Loop of Loop approach, I did reduce the n to 100 and 1000 respectively.
Loop of Loop is: search for each entry a not matching entry.
Cheers
Andi
private static void TestAverage(int n, bool rnd)
{
int k = 1000;
List<double> list = new List<double>();
Random r = new Random();
if (rnd) foreach (int i in Enumerable.Range(0, n)) list.Add(r.NextDouble());
else foreach (int i in Enumerable.Range(0, n)) list.Add(n);
Stopwatch s = new Stopwatch();
s.Start();
bool b = true;
foreach (var item in Enumerable.Range(0, k))
{
b ^= (Math.Abs(list.Average()  list.Min()) < 1E3); }
s.Stop();
Console.WriteLine("Average Elapsed: {0} > {1}", n, s.ElapsedMilliseconds);
}
private static void TestDistinct(int n, bool rnd)
{
int k = 1000;
List<double> list = new List<double>();
Random r = new Random();
if (rnd) foreach (int i in Enumerable.Range(0, n)) list.Add(r.NextDouble());
else foreach (int i in Enumerable.Range(0, n)) list.Add(n);
Stopwatch s = new Stopwatch();
s.Start();
bool b = true;
foreach (var item in Enumerable.Range(0, k))
{
b ^= (list.Distinct().Count() == 1); }
s.Stop();
Console.WriteLine("Distinct Elapsed: {0} > {1}", n, s.ElapsedMilliseconds);
}
private static void TestHash(int n, bool rnd)
{
int k = 1000;
List<double> list = new List<double>();
Random r = new Random();
if (rnd) foreach (int i in Enumerable.Range(0, n)) list.Add(r.NextDouble());
else foreach (int i in Enumerable.Range(0, n)) list.Add(n);
Stopwatch s = new Stopwatch();
s.Start();
bool b = true;
foreach (var item in Enumerable.Range(0, k))
{
b ^= (new HashSet<double>(list).Count() == 1); }
s.Stop();
Console.WriteLine("HashSet Elapsed: {0} > {1}", n, s.ElapsedMilliseconds);
}
private static void TestIndex(int n, bool rnd)
{
int k = 1000;
List<double> list = new List<double>();
Random r = new Random();
if (rnd) foreach (int i in Enumerable.Range(0, n)) list.Add(r.NextDouble());
else foreach (int i in Enumerable.Range(0, n)) list.Add(n);
Stopwatch s = new Stopwatch();
s.Start();
foreach (var item in Enumerable.Range(0, k))
{
bool isSame = true; foreach (int x in Enumerable.Range(0, list.Count)) { foreach (int y in Enumerable.Range(x+1, list.Countx1)) { isSame = (list[x] == list[y]); if (!isSame) break; } if (!isSame) break; } }
s.Stop();
Console.WriteLine("Index Elapsed: {0} > {1}", n, s.ElapsedMilliseconds);
}
public static void Main(string[] args)
{
TestAverage(1000, false);
TestAverage(100000, false);
TestDistinct(1000, false);
TestDistinct(100000, false);
TestHash(1000, false);
TestHash(100000, false);
TestIndex(100, false); TestIndex(1000, false);
TestAverage(1000, true);
TestAverage(100000, true);
TestDistinct(1000, true);
TestDistinct(100000, true);
TestHash(1000, true);
TestHash(100000, true);
TestIndex(100, true); TestIndex(1000, true); }
Sample results on my machine:
Average Elapsed: 1000 > 31
Average Elapsed: 100000 > 3167
Distinct Elapsed: 1000 > 41
Distinct Elapsed: 100000 > 3629
HashSet Elapsed: 1000 > 41
HashSet Elapsed: 100000 > 4323
Index Elapsed: 100 > 68
Index Elapsed: 1000 > 6064
Average Elapsed: 1000 > 31
Average Elapsed: 100000 > 3152
Distinct Elapsed: 1000 > 95
Distinct Elapsed: 100000 > 14816
HashSet Elapsed: 1000 > 83
HashSet Elapsed: 100000 > 9860
Index Elapsed: 100 > 0
Index Elapsed: 1000 > 0





After a quick look...
In the first three, you don't do anything with b, so I wonder whether or not the whole thing gets optimized away. Please use the value of b after calculating it.
I don't see and implementation of the OP's first (singleloop) design  which I suspect may outperform all the others.





1st: That is not optimized away  at least no difference can be measured.
2nd: agree, it outperfoms.
Cheers
Andi





Talking about expressive code, how about this (and *not* taking real values in my example):
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Solution3
{
class Program
{
static void Main(string[] args)
{
var myList = new List<int>() { 1, 1, 1 };
Console.WriteLine("{0} entries are the same", list.Distinct().Count() == 1
? "All" : "Not all");
Console.ReadKey();
}
}
}





Hello Erol,
never directly compare doubles directly  the native comparison is not reliably due to the fact that the real values do implicit rounding.
E.g. if your list was
double sixth = 1.0/6.0;
double six_sixths = sixth + sixth + sixth + sixth + sixth + sixth;
...
list.Add(1.0);
list.Add(six_sixths);
...
...list.Distinct()...
This would result in the two values 1.0 and 1.0 (the difference of the two values is something 1E16, but the precision of display is 1E15, that's why both numbers are displayed 1.0 but the actual binary values differ).
See the article Demystify C# floatingpoint equality and relation operations[^] for double comparison issues.
There is a Distinct(...) overload that takes an IEqualityComparer, but that one must provide a GetHashCode() function. Providing a Has Code for a fuzzy comparison as needed for double is a bit of a challenge: "same" values shall have same HashCode.
I'm still contemplating on a reliable and relatively cheap GetHashCode() function for relativey equal doubles (or float or decimals).
Cheers
Andi
PS: The same holds for dictionaries or hash tables with real values as keys. This won't really work...





You're absolutely right, Andi. Comparing doubles directly is dangerous, and almost always you need to consider error rounding. Maybe this is another great reason to go with the second solution.
Erol





I dare to disagree. I don't know of any way to make the second work reliably for double : the Distinct() overload requires a IEqualityComparer which must provide a HashCode for each value. There is no known way to me to provide such a HashCode for tolerant double comparison...
Since all the useful linq functions (FirstOrDefault, Contains, Distinct, etc.) most likely work internally on hash tables, they require for the IEqualityComparer  which is not reliably usable for double as stated above.
If your problem really works on double values, you are probably bound to an O(n*n) solution, based on FindIndex() function...
Cheers
Andi





Technically, you don't need to select it prior to doing the distinct. It just adds extra overhead.
I'd probably suggest rewriting it to be something of the sort:
isSame = (list.Distinct().Count() == 1);
Or even (without distinct):
isSame = ((list.Count > 0) && list.All(x => x == list[0]));
Two things to be aware of (albeit I'm sure this was just a vehicle for your example):
 Equality in floating point numbers are notoriously bad.
 You don't handle the case when the list is empty.





True and true. Many of the objects in .Net are rather amazing. The main purpose of the tip was to illustrate LINQ as it resembles SQL, which I truly love as a database language. I'm still trying to come to grips with lambda expressions and anonymous functions; very powerful stuff. Your first rewrite takes advantage of the simplicity of problem, but the LINQ expression maybe applicable to more complex problems. your second rewrite might give the other guy who complained about LINQ readability a heart attack
Erol





Quote: var query1 = (from num in myList select num).Distinct();
Couldn't you just do (myList.Distinct().Count == 1) ?





Yes, you could.
But sometimes engineers like the shiny method, rather than the straightforward, readable, or efficient one. I find this especially common amongst engineers raised on web technologies, which come and go like pop songs.







General News Suggestion Question Bug Answer Joke Rant Admin Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

First Posted  15 May 2012 
Views  8,070 
Bookmarked  2 times 

