As you can see the results are very diverse and it seems like the model does capture the main topic discussed but fail to compose a natural sentence. It was theorized by [Shen et al](https://arxiv.org/abs/1802.02032) that "BOW avoids the KL-vanishing problem, but the overall performance will significantly decrease because adding an additional loss, in theory, leads to a biased result for latent variables. BOW information is encoded into the latent variable, but it prevents the decoder from stably learning the word order pattern in the training step thus sacrifices the NLL performance." And "BOW, though not good at the NLL metric, performed remarkably well on this metric [topic similarity], which implies BOW is beneficial for the decoder to generate the correct high-level meaning but fails to transform the meaning to a fluent sentence."
I tried to train my dataset on Shen et al model but I got very bad results (even when validation loss keeps on decreasing). Whether I'm implementing something wrongly, choosing bad hyperparameters, or the model just isn't good enough is left to be explored.
I'm working on a project that has the potential to really improve people's life. I'm looking for someone with a deep understanding in the domain of conversational models. For more info: robi9011235 (at sign) gmail (dot) com