Data engineering

Idempotency - don't play dice

Idempotency is a cornerstone of reproducibility. Without it you cannot trust the results of the transformations. In order for a process to be idempotent - all of its components need to have this property. A pipeline can either be or not idempotent, there are no 'grades'.
Read more

Immutability - containing data chaos

Keeping your data unchanged for the duration of the processing (ideally versioned) saves you from the hell of dealing with moving parts. If your pipeline changes the source data in - place, you're asking for a catastrophe.
Read more