.NET 中如何快速实现 List 集合去重?
<h2 class="md-end-block md-heading"><span class="md-plain md-expand" style="font-size: 16px">前言</span></h2><p class="md-end-block md-p"><span class="md-plain" style="font-size: 16px">在数据处理中,去除集合中的重复元素是一个常见的需求。.NET 6 和 .NET 7 引入了 <span class="md-pair-s"><code>DistinctBy</code><span class="md-plain"> 方法,这是一个非常实用的新特性,可以方便地根据指定的键对集合进行去重。</span></span></span></p>
<p class="md-end-block md-p"><span class="md-plain" style="font-size: 16px">本文将详细介绍 <span class="md-pair-s"><code>DistinctBy</code><span class="md-plain"> 方法的使用,并通过具体的案例来展示其在实际开发中的应用。</span></span></span></p>
<h2 class="md-end-block md-heading"><span class="md-plain" style="font-size: 16px">正文</span></h2>
<h3><strong>1、<code>DistinctBy</code> 方法</strong></h3>
<p class="md-end-block md-p"><span class="md-pair-s" style="font-size: 16px"><code>DistinctBy</code><span class="md-plain"> 方法允许我们在 LINQ 查询中根据某个键对集合中的元素进行去重。</span></span></p>
<p class="md-end-block md-p"><span class="md-plain" style="font-size: 16px">这个方法返回一个新的集合,其中只包含根据指定键唯一确定的元素。</span></p>
<p class="md-end-block md-p"><span class="md-pair-s md-expand" style="font-size: 16px"><strong>方法签名</strong></span></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">public</span> <span style="color: rgba(0, 0, 255, 1)">static</span> IEnumerable<TSource> DistinctBy<TSource, TKey><span style="color: rgba(0, 0, 0, 1)">(
</span><span style="color: rgba(0, 0, 255, 1)">this</span> IEnumerable<TSource><span style="color: rgba(0, 0, 0, 1)"> source,
Func</span><TSource, TKey><span style="color: rgba(0, 0, 0, 1)"> keySelector
);</span></pre>
</div>
<h3 class="md-end-block md-heading"><strong><span class="md-plain" style="font-size: 16px">2、基本用法</span></strong></h3>
<p class="md-end-block md-p"><span class="md-plain" style="font-size: 16px">最简单的用法是在 LINQ 查询中直接调用 <span class="md-pair-s"><code>DistinctBy</code><span class="md-plain"> 方法,然后处理去重后的集合。</span></span></span></p>
<p class="md-end-block md-p"><span class="md-pair-s" style="font-size: 16px"><strong>说明</strong></span></p>
<p class="md-end-block md-p"><span class="md-plain" style="font-size: 16px">假设我们有一个用户列表,我们想要根据用户名去除重复的用户。</span></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">using</span><span style="color: rgba(0, 0, 0, 1)"> System.Linq;
</span><span style="color: rgba(0, 0, 255, 1)">class</span><span style="color: rgba(0, 0, 0, 1)"> User
{
</span><span style="color: rgba(0, 0, 255, 1)">public</span> <span style="color: rgba(0, 0, 255, 1)">string</span> Name { <span style="color: rgba(0, 0, 255, 1)">get</span>; <span style="color: rgba(0, 0, 255, 1)">set</span><span style="color: rgba(0, 0, 0, 1)">; }
</span><span style="color: rgba(0, 0, 255, 1)">public</span> <span style="color: rgba(0, 0, 255, 1)">int</span> Age { <span style="color: rgba(0, 0, 255, 1)">get</span>; <span style="color: rgba(0, 0, 255, 1)">set</span><span style="color: rgba(0, 0, 0, 1)">; }
}
</span><span style="color: rgba(0, 0, 255, 1)">var</span> users = <span style="color: rgba(0, 0, 255, 1)">new</span> List<User><span style="color: rgba(0, 0, 0, 1)">
{
</span><span style="color: rgba(0, 0, 255, 1)">new</span> User { Name = <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">Alice</span><span style="color: rgba(128, 0, 0, 1)">"</span>, Age = <span style="color: rgba(128, 0, 128, 1)">25</span><span style="color: rgba(0, 0, 0, 1)"> },
</span><span style="color: rgba(0, 0, 255, 1)">new</span> User { Name = <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">Bob</span><span style="color: rgba(128, 0, 0, 1)">"</span>, Age = <span style="color: rgba(128, 0, 128, 1)">32</span><span style="color: rgba(0, 0, 0, 1)"> },
</span><span style="color: rgba(0, 0, 255, 1)">new</span> User { Name = <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">Alice</span><span style="color: rgba(128, 0, 0, 1)">"</span>, Age = <span style="color: rgba(128, 0, 128, 1)">28</span><span style="color: rgba(0, 0, 0, 1)"> },
</span><span style="color: rgba(0, 0, 255, 1)">new</span> User { Name = <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">David</span><span style="color: rgba(128, 0, 0, 1)">"</span>, Age = <span style="color: rgba(128, 0, 128, 1)">35</span><span style="color: rgba(0, 0, 0, 1)"> }
};
</span><span style="color: rgba(0, 0, 255, 1)">var</span> distinctUsers = users.DistinctBy(user =><span style="color: rgba(0, 0, 0, 1)"> user.Name);
</span><span style="color: rgba(0, 0, 255, 1)">foreach</span> (<span style="color: rgba(0, 0, 255, 1)">var</span> user <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> distinctUsers)
{
Console.WriteLine($</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">Name: {user.Name}, Age: {user.Age}</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">);
}</span></pre>
</div>
<p class="md-end-block md-p"><span class="md-plain" style="font-size: 16px">输出结果:</span></p>
<div class="cnblogs_code">
<pre>Name: Alice, Age: <span style="color: rgba(128, 0, 128, 1)">25</span><span style="color: rgba(0, 0, 0, 1)">
Name: Bob, Age: </span><span style="color: rgba(128, 0, 128, 1)">32</span><span style="color: rgba(0, 0, 0, 1)">
Name: David, Age: </span><span style="color: rgba(128, 0, 128, 1)">35</span></pre>
</div>
<p class="md-end-block md-p"><span class="md-plain" style="font-size: 16px">过滤前后元素还是保持原有的顺序,我们可以查看源码。</span></p>
<p class="md-end-block md-p"><span class="md-pair-s" style="font-size: 16px"><strong>源码</strong></span></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">private</span> <span style="color: rgba(0, 0, 255, 1)">static</span> IEnumerable<TSource> DistinctByIterator<TSource, TKey>(IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>?<span style="color: rgba(0, 0, 0, 1)"> comparer)
{
</span><span style="color: rgba(0, 0, 255, 1)">using</span> IEnumerator<TSource> enumerator =<span style="color: rgba(0, 0, 0, 1)"> source.GetEnumerator();
</span><span style="color: rgba(0, 0, 255, 1)">if</span><span style="color: rgba(0, 0, 0, 1)"> (enumerator.MoveNext())
{
</span><span style="color: rgba(0, 0, 255, 1)">var</span> <span style="color: rgba(0, 0, 255, 1)">set</span> = <span style="color: rgba(0, 0, 255, 1)">new</span> HashSet<TKey><span style="color: rgba(0, 0, 0, 1)">(DefaultInternalSetCapacity, comparer);
</span><span style="color: rgba(0, 0, 255, 1)">do</span><span style="color: rgba(0, 0, 0, 1)">
{
TSource element </span>=<span style="color: rgba(0, 0, 0, 1)"> enumerator.Current;
</span><span style="color: rgba(0, 0, 255, 1)">if</span> (<span style="color: rgba(0, 0, 255, 1)">set</span><span style="color: rgba(0, 0, 0, 1)">.Add(keySelector(element)))
{
</span><span style="color: rgba(0, 0, 255, 1)">yield</span> <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> element;
}
}
</span><span style="color: rgba(0, 0, 255, 1)">while</span><span style="color: rgba(0, 0, 0, 1)"> (enumerator.MoveNext());
}
}</span></pre>
</div>
<p class="md-end-block md-p"><span class="md-plain" style="font-size: 16px">通过查看源码,可以看到是利用了 <span class="md-pair-s"><code>HashSet</code><span class="md-plain"> 去重,元素顺序并未被打乱。</span></span></span></p>
<p class="md-end-block md-p"><span class="md-plain" style="font-size: 16px">在处理集合时,我们经常需要去除重复的元素,同时保持原有的顺序。</span></p>
<p class="md-end-block md-p"><span class="md-plain" style="font-size: 16px">使用 <span class="md-pair-s"><code>HashSet</code><span class="md-plain"> 可以高效地实现这一目标。</span></span></span></p>
<p class="md-end-block md-p"><span class="md-plain" style="font-size: 16px">首先将指定的键尝试添加到 <span class="md-pair-s"><code>HashSet</code><span class="md-plain"> 中,如果添加成功,说明该键没有重复;</span></span></span></p>
<p class="md-end-block md-p"><span class="md-plain" style="font-size: 16px">如果添加失败,说明已经存在相同的键,此元素将被过滤掉。</span></p>
<h3 class="md-end-block md-heading"><strong><span class="md-plain" style="font-size: 16px">3、复杂用法</span></strong></h3>
<p class="md-end-block md-p"><span class="md-pair-s" style="font-size: 16px"><code>DistinctBy</code><span class="md-plain"> 方法可以用于更复杂的去重逻辑,例如根据多个属性进行去重。</span></span></p>
<p class="md-end-block md-p"><span class="md-pair-s " style="font-size: 16px"><strong>说明</strong></span></p>
<p class="md-end-block md-p"><span class="md-plain" style="font-size: 16px">假设我们有一个订单列表,我们想要根据客户名称和订单金额去除重复的订单。</span></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">class</span><span style="color: rgba(0, 0, 0, 1)"> Order
{
</span><span style="color: rgba(0, 0, 255, 1)">public</span> <span style="color: rgba(0, 0, 255, 1)">int</span> OrderId { <span style="color: rgba(0, 0, 255, 1)">get</span>; <span style="color: rgba(0, 0, 255, 1)">set</span><span style="color: rgba(0, 0, 0, 1)">; }
</span><span style="color: rgba(0, 0, 255, 1)">public</span> <span style="color: rgba(0, 0, 255, 1)">string</span> CustomerName { <span style="color: rgba(0, 0, 255, 1)">get</span>; <span style="color: rgba(0, 0, 255, 1)">set</span><span style="color: rgba(0, 0, 0, 1)">; }
</span><span style="color: rgba(0, 0, 255, 1)">public</span> <span style="color: rgba(0, 0, 255, 1)">decimal</span> Amount { <span style="color: rgba(0, 0, 255, 1)">get</span>; <span style="color: rgba(0, 0, 255, 1)">set</span><span style="color: rgba(0, 0, 0, 1)">; }
}
</span><span style="color: rgba(0, 0, 255, 1)">var</span> orders = <span style="color: rgba(0, 0, 255, 1)">new</span> List<Order><span style="color: rgba(0, 0, 0, 1)">
{
</span><span style="color: rgba(0, 0, 255, 1)">new</span> Order { OrderId = <span style="color: rgba(128, 0, 128, 1)">1</span>, CustomerName = <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">Alice</span><span style="color: rgba(128, 0, 0, 1)">"</span>, Amount = <span style="color: rgba(128, 0, 128, 1)">100.0m</span><span style="color: rgba(0, 0, 0, 1)"> },
</span><span style="color: rgba(0, 0, 255, 1)">new</span> Order { OrderId = <span style="color: rgba(128, 0, 128, 1)">2</span>, CustomerName = <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">Bob</span><span style="color: rgba(128, 0, 0, 1)">"</span>, Amount = <span style="color: rgba(128, 0, 128, 1)">150.0m</span><span style="color: rgba(0, 0, 0, 1)"> },
</span><span style="color: rgba(0, 0, 255, 1)">new</span> Order { OrderId = <span style="color: rgba(128, 0, 128, 1)">3</span>, CustomerName = <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">Alice</span><span style="color: rgba(128, 0, 0, 1)">"</span>, Amount = <span style="color: rgba(128, 0, 128, 1)">100.0m</span><span style="color: rgba(0, 0, 0, 1)"> },
</span><span style="color: rgba(0, 0, 255, 1)">new</span> Order { OrderId = <span style="color: rgba(128, 0, 128, 1)">4</span>, CustomerName = <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">Charlie</span><span style="color: rgba(128, 0, 0, 1)">"</span>, Amount = <span style="color: rgba(128, 0, 128, 1)">120.0m</span><span style="color: rgba(0, 0, 0, 1)"> },
</span><span style="color: rgba(0, 0, 255, 1)">new</span> Order { OrderId = <span style="color: rgba(128, 0, 128, 1)">5</span>, CustomerName = <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">Bob</span><span style="color: rgba(128, 0, 0, 1)">"</span>, Amount = <span style="color: rgba(128, 0, 128, 1)">150.0m</span><span style="color: rgba(0, 0, 0, 1)"> }
};
</span><span style="color: rgba(0, 0, 255, 1)">var</span> distinctOrders = orders.DistinctBy(order =><span style="color: rgba(0, 0, 0, 1)"> (order.CustomerName, order.Amount));
</span><span style="color: rgba(0, 0, 255, 1)">foreach</span> (<span style="color: rgba(0, 0, 255, 1)">var</span> order <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> distinctOrders)
{
Console.WriteLine($</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">Order ID: {order.OrderId}, Customer: {order.CustomerName}, Amount: {order.Amount}</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">);
}</span></pre>
</div>
<p class="md-end-block md-p"><span class="md-plain" style="font-size: 16px">输出结果:</span></p>
<div class="cnblogs_code">
<pre>Order ID: <span style="color: rgba(128, 0, 128, 1)">1</span>, Customer: Alice, Amount: <span style="color: rgba(128, 0, 128, 1)">100.0</span><span style="color: rgba(0, 0, 0, 1)">
Order ID: </span><span style="color: rgba(128, 0, 128, 1)">2</span>, Customer: Bob, Amount: <span style="color: rgba(128, 0, 128, 1)">150.0</span><span style="color: rgba(0, 0, 0, 1)">
Order ID: </span><span style="color: rgba(128, 0, 128, 1)">4</span>, Customer: Charlie, Amount: <span style="color: rgba(128, 0, 128, 1)">120.0</span></pre>
</div>
<h3 class="md-end-block md-heading"><strong><span class="md-plain" style="font-size: 16px">4、性能考虑</span></strong></h3>
<p class="md-end-block md-p"><span class="md-pair-s" style="font-size: 16px"><code>DistinctBy</code><span class="md-plain"> 方法在内部使用哈希表来跟踪已经出现的键,因此在大多数情况下性能非常好。但在处理非常大的数据集时,仍然需要注意内存使用情况。</span></span></p>
<p class="md-end-block md-p"><span class="md-pair-s" style="font-size: 16px"><strong>说明</strong></span></p>
<p class="md-end-block md-p"><span class="md-plain" style="font-size: 16px">假设我们有一个包含数百万条记录的大集合,我们需要根据某个键进行去重。</span></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">var</span> largeCollection = Enumerable.Range(<span style="color: rgba(128, 0, 128, 1)">1</span>, <span style="color: rgba(128, 0, 128, 1)">10000000</span>).Select(i => <span style="color: rgba(0, 0, 255, 1)">new</span> { Id = i, Value = i % <span style="color: rgba(128, 0, 128, 1)">1000</span><span style="color: rgba(0, 0, 0, 1)"> });
</span><span style="color: rgba(0, 0, 255, 1)">var</span> distinctLargeCollection = largeCollection.DistinctBy(item =><span style="color: rgba(0, 0, 0, 1)"> item.Value);
Console.WriteLine($</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">Distinct count: {distinctLargeCollection.Count()}</span><span style="color: rgba(128, 0, 0, 1)">"</span>);</pre>
</div>
<h3 class="md-end-block md-heading"><strong><span class="md-plain" style="font-size: 16px">5、异步 LINQ 查询中的使用</span></strong></h3>
<p class="md-end-block md-p"><span class="md-pair-s" style="font-size: 16px"><code>DistinctBy</code><span class="md-plain"> 方法也可以在异步 LINQ 查询中使用,结合 <span class="md-pair-s"><code>IAsyncEnumerable<T></code><span class="md-plain"> 类型,处理大量数据时更加高效。</span></span></span></span></p>
<p class="md-end-block md-p"><span class="md-pair-s " style="font-size: 16px"><strong>说明</strong></span></p>
<p class="md-end-block md-p"><span class="md-plain" style="font-size: 16px">假设我们有一个异步方法返回一个用户列表,我们想要根据用户名去除重复的用户。</span></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">using</span><span style="color: rgba(0, 0, 0, 1)"> System.Net.Http.Json;
</span><span style="color: rgba(0, 0, 255, 1)">public</span> <span style="color: rgba(0, 0, 255, 1)">async</span> IAsyncEnumerable<User><span style="color: rgba(0, 0, 0, 1)"> GetUsersAsync()
{
</span><span style="color: rgba(0, 0, 255, 1)">var</span> response = <span style="color: rgba(0, 0, 255, 1)">await</span> httpClient.GetAsync(<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">https://api.example.com/users</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">);
</span><span style="color: rgba(0, 0, 255, 1)">var</span> usersJson = <span style="color: rgba(0, 0, 255, 1)">await</span><span style="color: rgba(0, 0, 0, 1)"> response.Content.ReadAsStringAsync();
</span><span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 使用Json序列化工具解析用户列表</span>
<span style="color: rgba(0, 0, 255, 1)">var</span> users = JsonSerializer.Deserialize<List<User>><span style="color: rgba(0, 0, 0, 1)">(usersJson);
</span><span style="color: rgba(0, 0, 255, 1)">foreach</span> (<span style="color: rgba(0, 0, 255, 1)">var</span> user <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> users)
{
</span><span style="color: rgba(0, 0, 255, 1)">yield</span> <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> user;
}
}
</span><span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 使用异步LINQ查询</span>
<span style="color: rgba(0, 0, 255, 1)">var</span> distinctUsers = <span style="color: rgba(0, 0, 255, 1)">await</span> GetUsersAsync().DistinctByAsync(user =><span style="color: rgba(0, 0, 0, 1)"> user.Name).ToListAsync();
</span><span style="color: rgba(0, 0, 255, 1)">foreach</span> (<span style="color: rgba(0, 0, 255, 1)">var</span> user <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> distinctUsers)
{
Console.WriteLine($</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">Name: {user.Name}, Age: {user.Age}</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">);
}</span></pre>
</div>
<h2 class="md-end-block md-heading"><span class="md-plain" style="font-size: 16px">总结</span></h2>
<p class="md-end-block md-p"><span class="md-pair-s" style="font-size: 16px"><code>DistinctBy</code><span class="md-plain"> 方法是 .NET 6 和 .NET 7 中 LINQ 的一个非常实用的新特性。我们在 LINQ 查询中根据指定的键对集合进行去重,简化了代码并提高了开发效率。</span></span></p>
<p class="md-end-block md-p md-focus"><span class="md-plain" style="font-size: 16px">希望本文能帮助大家更好地理解和利用 .NET 6 和 .NET 7 中 LINQ 的 <span class="md-pair-s"><code>DistinctBy</code><span class="md-plain md-expand"> 方法,从而在项目中发挥更大的作用。</span></span></span></p>
<h2 class="md-end-block md-heading"><span class="md-plain" style="font-size: 16px">最后</span></h2>
<p><span class="md-plain" style="font-size: 16px"><span class="md-plain md-expand">如果你觉得这篇文章对你有帮助,不妨点个赞支持一下!你的支持是我继续分享知识的动力。如果有任何疑问或需要进一步的帮助,欢迎随时留言。也可以加入微信公众号 <span class="md-pair-s "><strong></strong><span class="md-plain md-expand"> 社区,与其他热爱技术的同行一起交流心得,共同成长!</span></span></span></span></p>
<p><span class="md-plain" style="font-size: 16px"><img src="https://img2024.cnblogs.com/blog/576536/202408/576536-20240813102419584-1596250541.png" alt="" style="display: block; margin-left: auto; margin-right: auto"></span></p><br><br>
来源:https://www.cnblogs.com/1312mn/p/18552496
頁:
[1]