Avoid COLLECT(DISTINCT()) - Amazon Neptune
Services or capabilities described in AWS documentation might vary by Region. To see the differences applicable to the AWS European Sovereign Cloud Region, see the AWS European Sovereign Cloud User Guide.

Avoid COLLECT(DISTINCT())

Note

As of engine version 1.4.7.0, this recommended rewrite is no longer needed.

COLLECT(DISTINCT()) is used whenever a list is to be formed containing distinct values. COLLECT is an aggregation function, and grouping is done based on additional keys being projected in the same statement. When distinct is used, the input is split in multiple chunks where each chunk denotes one group for reduction. Performance will be impacted as the number of groups increases. In Neptune, it is much more efficient to perform DISTINCT before actually collecting/forming the list. This allows grouping to be done directly on the grouping keys for the whole chunk.

Consider the following query:

MATCH (n:Person)-[:commented_on]->(p:Post) WITH n, collect(distinct(p.post_id)) as post_list RETURN n, post_list

A more optimal way of writing this query is:

MATCH (n:Person)-[:commented_on]->(p:Post) WITH DISTINCT n, p.post_id as postId WITH n, collect(postId) as post_list RETURN n, post_list