Pretraining with hierarchical memories: separating long-tail and common knowledge | Synapse