User Tools

Site Tools


big_data_service:best_practice

Elasticsearch Best Practice

The App Store Connect installer deploys the Big Data Service: a single node elasticsearch cluster that is employed by the App Store Connection for local data storage. The main use-cases are when the App Store Connection is configured to record process data to IP Historian, and/or record Alarm & Event data to Alarm Analysis. If significant volumes of data are stored, the server administrator should be prepared to monitor and reconfigure the Elasticseach installation to maintain performance and availability.

Elasticsearch Terminology

Cluster Elasticsearch is made up of a cluster of one or more nodes.

The default “Big Data Service” install is composed of only a single node running on the same server as the other App Store Connect processes.
Node A node is a single running instance of elasticseach.

Typically one node runs per server.
Index An index is where the documents are stored.
Document A document is a structured data object, like a record
Shard An index is made up of one or more shards.

A shard is defined as either a “primary” or “replica” shard:
* A primary shard is responsible for write operations (index, re-index and delete) and reads.
* A replica shard is responsible only for read operations (searches and gets).

Each shard is a complete set of all the data in the index. The reason for duplicating the data across multiple shards is to improve availaility and read/write througput.

Factors affecting elasticsearch performance...

Server Constraints

The server hosting App Store Connect must have suitable for processing and storing data. Consider…

  • CPU
  • RAM
  • Drive size
  • Drive type (SSD or hard disk).

Balance this against your requirements:

  • What are the data ingestion rates?
  • Is data collection constant or at particular times of the day?
  • How many users are supported and what activities do they perform?

Sharding

Data in elasticsearch are stored in shards and the size and number of shards will impact performance and stability.

To obtain a list of all the shards on an elasticsearch node, open the following URL on your App Store Connect server.

http://localhost:9200/_cat/shards?v

Guidelines:

Aim for shard sizes between 10GB and 50GB

Whilst applications manage most of the data processes, there may be some scope to modify sharding and indexing configuration in data core driver configuration. For example, in IP Historian, you can set how many shards are created per index, and you can configure if data is stored in monthly, yearly or one-big index.

Aim for 20 shards or fewer per GB of RAM allocated to Elasticsearch

A default installation of App Store Connect allocates only 1GB of RAM to Elasticsearch. If the App Store Connect is importing Alarm Analysis data, it will create 2 shards per calendar month - thus reaching the recommended limit within 10 months. Even a node configured with the max 32GB RAM will reach its limit of 640 indices by 5 years if recording Alarm Analysis data for 5 assets.

Exceeding these limits may lead to issues index corruption. For Alarm Analysis, this could manifest in the loss of monthly blocks of data.

What to do if you are approaching advised limits?

Most of the options below require specialist knowledge of elasticsearch. Consult with the Intelligent Plant team if unsure how to proceed.

Data Source Configuration
Change data index settings that determine how data is stored. IP Historian exposes index type and sharding properties.
NB. Modifying these settings on a live system is not advised.

Scale Up
Increase server RAM and allocating more to the Big Data process.

Scale Out
Introduce more elasticseach nodes on dedicated infrastructure.

Data Archiving
Either remove old data, or migrate old data to another App Store Connect instance dediated to archiving.

Data Consolidation
Re-index smaller data nodes together. For instance, consolidate Alarm Analysis monthly indices into single (one shard) indices.
NB. The new index would need alias names of all the indices replaced in order to support application function.

big_data_service/best_practice.txt · Last modified: 2022/11/17 16:22 by su