First of all: do enterprises (if they actually use deep learning, because it kind of works in my experience) retrain the neural network at each step with the new row of data? And, if so, do they take all data as the training set, and validate nothing?
Second: do they use some kernel for the loss in order to penalize errors of the model in more recent data? That would seem extremely logical to me.
Third: which arquitecture is it often used? I've been using LSTM but in [1] the SOTA arquitectures are way more advanced.
Four: as we are dealing with paths with a strong stochastic component, does it make more sense to pursue a binary output with a cross-entropy loss instead of the value of the target? As in: I will make this decision if the value exceeds 'x', let me train the nn so that it outputs the probability of exceeding 'x', or, maybe, outputing the p-confidence intervals, for example, instead of trying to naively guess the value.
[1] https://github.com/thuml/Time-Series-Library