User Tools

Site Tools


big_data_service:best_practice

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
big_data_service:best_practice [2022/11/17 13:24] subig_data_service:best_practice [2022/11/17 16:22] (current) – [What to do if you are approaching advised limits?] su
Line 11: Line 11:
 | Shard | An index is made up of one or more shards. \\ \\ A shard is defined as either a "primary" or "replica" shard: \\ * A primary shard is responsible for write operations (index, re-index and delete) and reads. \\ * A replica shard is responsible only for read operations (searches and gets). \\ \\ Each shard is a complete set of all the data in the index. The reason for duplicating the data across multiple shards is to improve availaility and read/write througput. | | Shard | An index is made up of one or more shards. \\ \\ A shard is defined as either a "primary" or "replica" shard: \\ * A primary shard is responsible for write operations (index, re-index and delete) and reads. \\ * A replica shard is responsible only for read operations (searches and gets). \\ \\ Each shard is a complete set of all the data in the index. The reason for duplicating the data across multiple shards is to improve availaility and read/write througput. |
  
-===== Considerations... =====+===== Factors affecting elasticsearch performance... =====
  
 ==== Server Constraints ==== ==== Server Constraints ====
  
-The server hosting App Store Connect must have suitable for processing and storing data. Consider CPURAM, dirve size, drive type (SSD or hard disk). +The server hosting App Store Connect must have suitable for processing and storing data. Consider... 
 +  * CPU 
 +  * RAM 
 +  * Drive size 
 +  * Drive type (SSD or hard disk). 
  
 Balance this against your requirements: Balance this against your requirements:
- What are the data ingestion rates? +  * What are the data ingestion rates? 
- - Are these consistant or do they vary over the course of the day? +  * Is data collection constant or at particular times of the day? 
- How manu users must you support?+  How many users are supported and what activities do they perform?
  
 ==== Sharding ==== ==== Sharding ====
 +
 +Data in elasticsearch are stored in shards and the size and number of shards will impact performance and stability.
  
 To obtain a list of all the shards on an elasticsearch node, open the following URL on your App Store Connect server.  To obtain a list of all the shards on an elasticsearch node, open the following URL on your App Store Connect server. 
Line 38: Line 44:
 **Aim for 20 shards or fewer per GB of RAM allocated to Elasticsearch** **Aim for 20 shards or fewer per GB of RAM allocated to Elasticsearch**
  
-A default configuration of App Store Connect allocates only 1GB of RAM to Elasticsearch. If the App Store Connect is importing data, it will create 2 shards per calednar month - meaning you will reach the recommended limit within 10 months. +A default installation of App Store Connect allocates only 1GB of RAM to Elasticsearch. If the App Store Connect is importing Alarm Analysis data, it will create 2 shards per calendar month - thus reaching the recommended limit within 10 months. Even a node configured with the max 32GB RAM will reach its limit of 640 indices by 5 years if recording Alarm Analysis data for 5 assets.
  
-A node configured with the max 32GB RAM will be capable of supporing 640 indices - the equivalent of 5 years Alarm Analysis data for 5 assets.+Exceeding these limits may lead to issues index corruption. For Alarm Analysis, this could manifest in the loss of monthly blocks of data.
  
 +===== What to do if you are approaching advised limits? =====
 +Most of the options below require specialist knowledge of elasticsearch. Consult with the Intelligent Plant team if unsure how to proceed. 
  
 +**Data Source Configuration** \\
 +Change data index settings that determine how data is stored. IP Historian exposes index type and sharding properties. \\
 +NB. Modifying these settings on a live system is not advised.
  
 +**Scale Up** \\
 +Increase server RAM and allocating more to the Big Data process.
  
-This may result in spurious index corruption issues.+**Scale Out** \\ 
 +Introduce more elasticseach nodes on dedicated infrastructure
  
 +**Data Archiving** \\
 +Either remove old data, or migrate old data to another App Store Connect instance dediated to archiving.
  
 +**Data Consolidation** \\
 +Re-index smaller data nodes together. For instance, consolidate Alarm Analysis monthly indices into single (one shard) indices. \\ 
 +NB. The new index would need alias names of all the indices replaced in order to support application function.
  
big_data_service/best_practice.1668691498.txt.gz · Last modified: 2022/11/17 13:24 by su