How to Implement Elasticsearch Aggregation Query in Liferay DXP

blog-banner

When working with large datasets in Liferay DXP, you may encounter challenges like duplicate data in search results. Elasticsearch Aggregation Queries offer a powerful solution, allowing you to retrieve distinct records by grouping data efficiently. This approach optimizes performance and enhances the clarity of search results, ensuring unique values are returned.

Why Aggregation Queries?

Aggregation queries are essential for Liferay Integration with Elasticsearch, particularly in scenarios where you need to summarize or filter large amounts of data without redundancy. By grouping records based on specific fields, aggregation queries eliminate duplicate entries and provide unique results. This is especially useful for generating reports, search facets, and dashboards in Liferay DXP. With Elasticsearch Aggregation Query, you can efficiently aggregate data across multiple entries, ensuring that search results are optimized for accuracy and performance.

Prerequisites

  • Basic knowledge of Liferay
  • Liferay DXP/Portal 7.4+
  • Elasticsearch Setup
  • Kibana (optional)
  • Basic Elasticsearch Query Knowledge
  • Indexing Sample Data

Creating Categories

First, define a vocabulary called "News," which contains three categories: "Sports," "Current Affairs," and "Entertainment." These categories will be used to organize your News web content.

News Vocabulary

Creating the News Structure

Next, create a structure for your news articles, including fields for the title and description. This structure will be used to define the format of the news articles.

News Structure

Creating Web Contents

Create a few web contents, assign them appropriate descriptions, and select the predefined categories for each content.

Sport category Web Content

Creating Elasticsearch Query

In this step, we'll construct an Elasticsearch Aggregation Query to fetch the distinct categories selected in the web content.

Elasticsearch Aggregation Query

Query Breakdown

  • query:

    Uses a bool query with must clauses to apply multiple filters:

    • entryClassName: Matches "com.liferay.journal.model.JournalArticle" to target Liferay journal articles.
    • ddmStructureKey: Matches "34115", focusing on the News structure.
    • latest: Matches "true" to ensure only the latest versions of articles are retrieved.
  • aggs (Aggregations):

    Defines a terms aggregation called "sports":

    • field: "assetVocabularyCategoryIds" groups results by assetVocabularyCategoryIds.
    • include: "34105-.*" applies a regex to include only categories with IDs starting with "34105-", enabling category-specific aggregation. This is a crucial step in ensuring Liferay integration with Elasticsearch, as it allows for aggregation across different types of content (e.g., journal articles).

Kibana Query Response

Here, you can visualize the aggregation results in Kibana.

Kibana Query Response Image

Implementing Elasticsearch Query in Java

  1. Before querying Elasticsearch, we need to fetch the Vocabulary ID for the "News" category using Liferay's AssetVocabularyLocalService:

    private long getVocabularyIdFromName(String vocabularyName, long groupId) { 
        AssetVocabulary vocabulary = assetVocabularyLocalService.fetchGroupVocabulary(groupId, "News”); 
        return (vocabulary != null) ? vocabulary.getVocabularyId() : 0; 
    } 
  2. Make sure to include the following imports.

    import com.liferay.portal.search.aggregation.Aggregations; 
    import com.liferay.portal.search.aggregation.bucket.TermsAggregation; 
    import com.liferay.portal.search.aggregation.bucket.TermsAggregationResult; 
    import com.liferay.portal.search.query.BooleanQuery; 
    import com.liferay.portal.search.query.Queries; 
    import com.liferay.portal.search.searcher.SearchRequest; 
    import com.liferay.portal.search.searcher.SearchRequestBuilder; 
    import com.liferay.portal.search.searcher.SearchRequestBuilderFactory; 
    import com.liferay.portal.search.searcher.SearchResponse; 
    import com.liferay.portal.search.searcher.Searcher; 
  3. Ensure that you have the necessary references included in your class.

    @Reference 
    private Searcher searcher; 
    
    @Reference 
    private Aggregations aggregations; 
    
    @Reference 
    private Queries queries; 
    
    @Reference 
    private AssetCategoryLocalService assetCategoryLocalService; 
    
    @Reference 
    private AssetVocabularyLocalService assetVocabularyLocalService; 
    
    @Reference 
    private SearchRequestBuilderFactory searchRequestBuilderFactory; 
  4. Building the Elasticsearch Aggregation Query request.

    private SearchRequest buildNewsAggregationQuery(long vocabularyId, ServiceContext serviceContext) {  
        
        SearchRequestBuilder searchRequestBuilder = searchRequestBuilderFactory.builder();
        searchRequestBuilder.size(1000);  
        
        searchRequestBuilder.companyId(serviceContext.getCompanyId()); 
        searchRequestBuilder.emptySearchEnabled(true); 
        
        BooleanQuery booleanQuery = queries.booleanQuery(); 
         
        try { 
            booleanQuery.addMustQueryClauses( 
                queries.term("entryClassName", "com.liferay.journal.model.JournalArticle"), 
                queries.term("ddmStructureKey",  “34115”), 
                queries.term("latest", true) 
            ); 
        } catch (Exception e) { 
            e.printStackTrace(); 
        } 
         
        TermsAggregation aggregation = aggregations.terms("news", "assetVocabularyCategoryIds"); 
        aggregation.setIncludeExcludeClause(new IncludeExcludeClauseImpl(vocabularyId + "-.*", StringPool.BLANK)); 
         
        searchRequestBuilder.addAggregation(aggregation); 
        return searchRequestBuilder.query(booleanQuery).build(); 
    } 
    
  5. To ensure proper filtering of aggregation results, we need to implement the IncludeExcludeClauseImpl class. This class is responsible for defining include/exclude regex patterns that help filter categories within our Elasticsearch Aggregation Query. Create the IncludeExcludeClauseImpl class using the Liferay source code reference below.

    public class IncludeExcludeClauseImpl implements IncludeExcludeClause {  
    
        private String[] excludedValues; 
        private String excludeRegex;  
        private String[] includedValues;  
        private String includeRegex; 
        
        public IncludeExcludeClauseImpl(String includeRegex, String excludeRegex) { 
            this.includeRegex = includeRegex; 
            this.excludeRegex = excludeRegex; 
        } 
         
        public String[] getExcludedValues() { 
            return this.excludedValues; 
        } 
         
        public String getExcludeRegex() { 
            return this.excludeRegex; 
        } 
         
        public String[] getIncludedValues() { 
            return this.includedValues; 
        } 
         
        public String getIncludeRegex() { 
            return this.includeRegex; 
        }
    }
  6. Executing the Search Query

    SearchRequest searchRequest = buildNewsAggregationQuery(vocabularyId, serviceContext);
    SearchResponse searchResponse = searcher.search(searchRequest); 

Process the response in Java

  1. Retrieve Aggregation Results : Extracts the TermsAggregationResult named "news" from the SearchResponse. This contains the aggregation results for news categories.
    TermsAggregationResult aggregationResult = (TermsAggregationResult) searchResponse.getAggregationResult("news"); 
  2. Process Buckets : Iterates over each bucket (category group) in the aggregation result.
    if (aggregationResult != null) { 
            aggregationResult.getBuckets().forEach(bucket -> { 
                String key = bucket.getKey(); 
                long documnetCount = bucket.getDocCount(); // Get document count 
                
                // Add the custom bussiness logic to process the Aggregation data 
    
            }); 
    }
  3. The following image shows the raw key-value pairs obtained using the provided code snippet

    Map<String, Long> categoryResult = new HashMap<>(); 
    
    TermsAggregationResult aggregationResult = (TermsAggregationResult)searchResponse.getAggregationResult("news"); 
    
    if (aggregationResult != null) { 
        aggregationResult.getBuckets().forEach((bucket) -> { 
            String key = bucket.getKey(); 
            long documnetCount = bucket.getDocCount(); // Get document count 
            if (key != null) { 
                categoryResult.put(key, documnetCount); 
            } 
        }); 
    } 
    
    for(Map.Entry<String, Long> entry : categoryResult.entrySet()) {
        _log.debug("Label: " + entry.getKey() + ", Value: " + entry.getValue());
    } 
    Following image shows the raw key-value pairs obtained using the provided code snippet
  4. Further process or print the categoryResult to utilize the extracted bucket results effectively.

  5. Resolving Category IDs into Names : The following image shows the resolving category IDs into meaningful names using assetCategoryLocalService, we obtain the following results.
    Map<String, Long> categoryResult = new HashMap<>(); 
    
    TermsAggregationResult aggregationResult = (TermsAggregationResult)searchResponse.getAggregationResult("news"); 
    
    if (aggregationResult != null) { 
        aggregationResult.getBuckets().forEach((bucket) -> { 
            String key = ""; 
            if(Validator.isNotNull(Long.parseLong(bucket.getKey().split("-")[1]))){ 
                try { 
                    key = assetCategoryLocalService.getAssetCategory(Long.parseLong(bucket.getKey().split("-")[1])).getName(); 
                } catch (PortalException e) { 
                    throw new RuntimeException(e); 
                } 
            } 
    
            long documnetCount = bucket.getDocCount(); // Get document count 
            if (Validator.isNotNull(key)) { 
                categoryResult.put(key, documnetCount); 
            } 
        }); 
    }
    
    for(Map.Entry<String, Long> entry : categoryResult.entrySet()) { 
        _log.debug("Label: " + entry.getKey() + ", Value: " + entry.getValue()); 
    }
    Following image shows the resolving category IDs into meaningful names using assetCategoryLocalService.

Conclusion

In summary, using Elasticsearch Aggregation Queries within Liferay DXP helps streamline data management by ensuring unique, categorized results. Whether you're dealing with large datasets for reporting, search facets, or dashboards, Liferay Elasticsearch Integration optimizes performance and prevents data redundancy. By grouping data based on specific fields, you not only improve the user experience but also enhance the clarity of search results, ensuring that only relevant and distinct data is presented.

Finally, we demonstrate how to implement this query in Java using a Boolean query, an essential technique in Liferay Development that enables us to efficiently locate specific articles based on criteria within nested fields. By following these steps, you can successfully implement Liferay Data Aggregation to retrieve distinct categories, ultimately enhancing the search experience and data management in your Liferay applications.

Contact us

For Your Business Requirements

Text to Identify Refresh CAPTCHA
Background Image Close Button

2 - 4 October 2024

Hall: 10, Booth: #B8 Brussels, Belgium