文本生成公开数据集/开源工具/经典论文详细列表分享

这是一份由清华大学自然语言处理小组整理的文本生成相关的公开数据集/开源工具/经典论文列表，并且不断增加论文和持续修改名单，分享给大家。

源链接：https://github.com/THUNLP-MT/TG-Reading-List

数据集

故事生成

文本生成

工具

经典论文

相关论文

基于Seq2Seq的方法

基于变分自动编码器的方法

基于生成对抗网的方法

基于强化学习的方法

基于知识的方法

风格转移

公开数据集

故事生成

ROCStories: Mostafazadeh, Nasrin and Chambers, Nathanael and He, Xiaodong and Parikh, Devi and Batra, Dhruv and Vanderwende, Lucy and Kohli, Pushmeet and Allen, James. 2016. A Corpus and Evaluation Framework for Deeper Understanding of Commonsense Stories. In Proceedings of NAACL-HLT 2016.

VIST: Huang, Ting-Hao (Kenneth) and Ferraro, Francis and Mostafazadeh, Nasrin and Misra, Ishan and Agrawal, Aishwarya and Devlin, Jacob and Girshick, Ross and He, Xiaodong and Kohli, Pushmeet and Batra, Dhruv and Zitnick, C. Lawrence and Parikh, Devi and Vanderwende, Lucy and Galley, Michel and Mitchell, Margaret. 2016. Visual Storytelling. In Proceedings of ACL 2016.

WritingPrompts: Fan, Angela and Lewis, Mike and Dauphin, Yann. 2018. Hierarchical Neural Story Generation. In Proceedings of ACL 2018.

文本生成

Yelp Review Generation Dataset: Xu, Jingjing and Ren, Xuancheng and Lin, Junyang and Sun, Xu. 2018. Diversity-Promoting GAN: A Cross-Entropy Based Generative Adversarial Network for Diversified Text Generation. In Proceedings of EMNLP 2018.

Amazon Review Generation Dataset: McAuley, Julian John and Leskovec, Jure. 2013. From Amateurs to Connoisseurs: Modeling The Evolution of User Expertise Through Online Reviews. In Proceedings of WWW 2013.

Zhihu Dataset and Composition Dataset: Feng, Xiaocheng and Liu, Ming and Liu, Jiahao and Qin, Bing and Sun, Yibo and Liu, Ting. 2018. Topic-to-essay generation with neural networks. In Proceedings of IJCAI 2018.

ACL Title and Abstract Dataset: Wang, Qingyun and Zhou, Zhihao and Huang, Lifu and Whitehead, Spencer and Zhang, Boliang and Ji, Heng and Knight, Kevin. 2018. Paper Abstract Writing through Editing Mechanism. In Proceedings of ACL 2018.

AGENDA Dataset: Rik, Koncel-Kedziorski and Dhanush, Bekal and Yi, Luan and Mirella, Lapata and Hannaneh, Hajishirzi. 2019. Text Generation from Knowledge Graphs with Graph Transformers. In Proceedings of NAACL-HLT 2019.

开源工具

Hu, Zhiting and Yang, Zichao and Zhao, Tiancheng and Shi, Haoran and He, Junxian and Wang, Di and Ma, Xuezhe and Liu, Zhengzhong and Liang, Xiaodan and Qin, Lianhui and others. 2018. Texar: A Modularized, Versatile, and Extensible Toolbox for Text Generation. In Proceedings of ACL 2018. (GitHub)

Zhu, Yaoming and Lu, Sidi and Zheng, Lei and Guo, Jiaxian and Zhang, Weinan and Wang, Jun and Yu, Yong. 2018. Textgen: A Benchmarking Platform for Text Generation Models. In Proceedings of SIGIR 2018. (GitHub)

Radford, Alec and Wu, Jeffrey and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya. 2019. Language models are unsupervised multitask learners. OpenAI Blog, 1:8. (GitHub)

Seraphina, Goldfarb-Tarrant and Haining, Feng and Nanyun, Peng. 2019. Plan, Write, and Revise: an Interactive System for Open-Domain Story Generation. In Proceedings of NAACL-HLT 2019. (GitHub)

经典论文

相关论文

Kingma, Diederik P and Welling, Max. 2014. Auto-Encoding Variational Bayes. In Proceedings of ICLR 2014. (Citation: 4,317)

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. In Proceedings of NeurIPS 2014. (Citation: 6,076)

Goodfellow, Ian and Pouget-Abadie, Jean and Mirza, Mehdi and Xu, Bing and Warde-Farley, David and Ozair, Sherjil and Courville, Aaron and Bengio, Yoshua. 2014. Generative Adversarial Nets. In Proceedings of NeurIPS 2014. (Citation: 7,952)

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of ICLR 2015. (Citation: 6,317)

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Proceedings of NeurIPS 2017. (Citation: 1,393)

Jacob, Devlin and Ming-Wei, Chang and Kenton, Lee and Kristina, Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT 2019. (Citation: 345)

基于Seq2Seq的方法

Huang, Ting-Hao (Kenneth) and Ferraro, Francis and Mostafazadeh, Nasrin and Misra, Ishan and Agrawal, Aishwarya and Devlin, Jacob and Girshick, Ross and He, Xiaodong and Kohli, Pushmeet and Batra, Dhruv and Zitnick, C. Lawrence and Parikh, Devi and Vanderwende, Lucy and Galley, Michel and Mitchell, Margaret. 2016. Visual Storytelling. In Proceedings of ACL 2016. (Citation: 76)

Melissa Roemmele. 2016. Writing Stories with Help from Recurrent Neural Networks. In Proceedings of AAAI 2016. (Citation: 13)

Jain, Parag and Agrawal, Priyanka and Mishra, Abhijit and Sukhwani, Mohak and Laha, Anirban and Sankaranarayanan, Karthik. 2017. Story Generation from Sequence of Independent Short Descriptions. arXiv preprint arXiv:1707.05501. (Citation: 10)

Liu, Tianyu and Wang, Kexiang and Sha, Lei and Chang, Baobao and Sui, Zhifang. 2017. Table-to-text Generation by Structure-aware Seq2seq Learning. In Proceedings of AAAI 2018. (Citation: 14)

Fan, Angela and Lewis, Mike and Dauphin, Yann. 2018. Hierarchical Neural Story Generation. In Proceedings of ACL 2018. (Citation: 18)

Song, Linfeng and Zhang, Yue and Wang, Zhiguo and Gildea, Daniel. 2018. A Graph-to-Sequence Model for AMR-to-Text Generation. In Proceedings of ACL 2018. (Ciation: 10)

Martin, Lara J and Ammanabrolu, Prithviraj and Wang, Xinyu and Hancock, William and Singh, Shruti and Harrison, Brent and Riedl, Mark O. 2018. Event Representations for Automated Story Generation with Deep Neural Nets. In Proceedings of AAAI 2018. (Citation: 30)

Clark, Elizabeth and Ji, Yangfeng and Smith, Noah A. 2018. Neural Text Generation in Stories Using Entity Representation as Contex. In Proceedings of NAACL-HLT 2018. (Citation: 7)

Wiseman, Sam and Shieber, Stuart and Rush, Alexander. 2018. Learning Neural Templates for Text Generation. In Proceedings of EMNLP 2018. (Citation: 5)

Chaturvedi, Snigdha and Peng, Haoruo and Roth, Dan. 2018. Story Comprehension for Predicting What Happens Next. In Proceedings of EMNLP 2018. (Citation: 15)

Zhang, Yue and Liu, Qi and Song, Linfeng. 2018. Sentence-State LSTM for Text Representation. In Proceedings of ACL 2018. (Citation: 5)

Kezar, Lee. 2018. Mixed Feelings: Natural Text Generation with Variable, Coexistent Affective Categories. In Proceedings of ACL 2018, Student Research Workshop.

Welleck, Sean and Brantley, Kianté and Daumé III, Hal and Cho, Kyunghyun. 2019. Non-Monotonic Sequential Text Generation. In Proceedings of ICML 2019. (Citation: 1)

Nikolaos, Pappas and James, Henderson. 2019. Deep Residual Output Layers for Neural Language Generation. In Proceedings of ICML 2019.

Amit, Moryossef and Yoav, Goldberg and Ido, Dagan. 2019. Step-by-Step: Separating Planning from Realization in Neural Data to Text Generation. In Proceedings of NAACL-HLT 2019.

Sheng, Shen and Daniel, Fried and Jacob, Andreas and Dan, Klein. 2019. Pragmatically Informative Text Generation. In Proceedings of NAACL-HLT 2019.

Fan, Angela and Lewis, Mike and Dauphin, Yann. 2019. Strategies for Structuring Story Generation. In Proceedings of ACL 2019.

基于变分推理的方法

Li, Jiwei and Luong, Thang and Jurafsky, Dan. 2015. A Hierarchical Neural Autoencoder for Paragraphs and Documents. In Proceedings of ACL 2015. (Citation: 283)

Semeniuta, Stanislau and Severyn, Aliaksei and Barth, Erhardt. 2017. A Hybrid Convolutional Variational Autoencoder for Text Generation. In Proceedings of EMNLP 2017. (Citation: 57)

Serban, Iulian Vlad and Ororbia II, Alexander and Pineau, Joelle and Courville, Aaron. 2017. Piecewise Latent Variables for Neural Variational Text Processing. In Proceedings of EMNLP 2017. (Citation: 11)

Yang, Zichao and Hu, Zhiting and Salakhutdinov, Ruslan and Berg-Kirkpatrick, Taylor. 2017. Improved Variational Autoencoders for Text Modeling using Dilated Convolutions. In Proceedings of ICML 2017. (Citation: 72)

Hu, Zhiting and Yang, Zichao and Liang, Xiaodan and Salakhutdinov, Ruslan and Xing, Eric P. 2017. Toward Controlled Generation of Text. In Proceedings of ICML 2017. (Citation: 120)

Deng, Yuntian and Kim, Yoon and Chiu, Justin and Guo, Demi and Rush, Alexander. 2018. Latent Alignment and Variational Attention. In Proceedings of NeurIPS 2018. (Citation: 9)

Kim, Yoon and Wiseman, Sam and Miller, Andrew C and Sontag, David and Rush, Alexander M. 2018. Semi-Amortized Variational Autoencoders. In Proceedings of ICML 2018. (Citation: 27)

Bahuleyan, Hareesh and Mou, Lili and Vechtomova, Olga and Poupart, Pascal. 2018. Variational Attention for Sequence-to-Sequence Models. In Proceedings of COLING 2018. (Citation: 14)

Xu, Jiacheng and Durrett, Greg. 2018. Spherical Latent Spaces for Stable Variational Autoencoders. In Proceedings of EMNLP 2018. (Citation: 6)

Yoo, Kang Min and Shin, Youhyun and Lee, Sang-goo. 2019. Data Augmentation for Spoken Language Understanding via Joint Variational Generation. In Proceedings of AAAI 2019. (Citation: 2)

Wenlin, Wang and Zhe, Gan and Hongteng, Xu and Ruiyi, Zhang and Guoyin, Wang and Dinghan, Shen and Changyou, Chen and Lawrence, Carin. 2019. Topic-Guided Variational Auto-Encoder for Text Generation. In Proceedings of NAACL-HLT 2019.

Bahuleyan, Hareesh and Mou, Lili and Vamaraju, Kartik and Zhou, Hao and Vechtomova, Olga. 2019. Probabilistic Natural Language Generation with Wasserstein Autoencoders. In Proceedings of NAACL-HLT 2019. (Citation: 3)

Gu, Xiaodong and Cho, Kyunghyun and Ha, Jung-Woo and Kim, Sunghun. 2019. DialogWAE: Multimodal Response Generation with Conditional Wasserstein Auto-Encoder. In Proceedings of ICLR 2019. (Citation: 9)

Zhang, Xinyuan and Yang, Yi and Yuan, Siyang and Shen, Dinghan and Carin, Lawrence. 2019. Syntax-Infused Variational Autoencoder for Text Generation. In Proceedings of ACL 2019.

Shen, Dinghan and Celikyilmaz, Asli and Zhang, Yizhe and Chen, Liqun and Wang, Xin and Gao, Jianfeng and Carin, Lawrence. 2019. Towards Generating Long and Coherent Text with Multi-Level Latent Variable Models. In Proceedings of ACL 2019.

基于生成对抗网络的方法

Kusner, Matt J and Hernández-Lobato, José Miguel. 2016. GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution. arXiv preprint arXiv:1611.04051. (Citation: 71)

Gulrajani, Ishaan and Ahmed, Faruk and Arjovsky, Martin and Dumoulin, Vincent and Courville, Aaron C. 2017. Improved Training of Wasserstein GANs. In Proceedings of NeurIPS 2017. (Citation: 1,102)

Yu, Lantao and Zhang, Weinan and Wang, Jun and Yu, Yong. 2017. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. In Proceedings of AAAI 2017. (Citation: 436)

Liang, Xiaodan and Hu, Zhiting and Zhang, Hao and Gan, Chuang and Xing, Eric P. 2017. Recurrent Topic-Transition GAN for Visual Paragraph Generation. In Proceedings of IEEE 2017. (Citation: 65)

Zhang, Yizhe and Gan, Zhe and Fan, Kai and Chen, Zhi and Henao, Ricardo and Shen, Dinghan and Carin, Lawrence. 2017. Adversarial Feature Matching for Text Generation. In Proceedings of ICML 2017. (Citation: 68)

Guo, Jiaxian and Lu, Sidi and Cai, Han and Zhang, Weinan and Yu, Yong and Wang, Jun. 2017. Long Text Generation via Adversarial Training with Leaked Information. In Proceedings of AAAI 2018. (Citation: 46)

Xu, Jingjing and Ren, Xuancheng and Lin, Junyang and Sun, Xu. 2018. Diversity-Promoting GAN: A Cross-Entropy Based Generative Adversarial Network for Diversified Text Generation. In Proceedings of EMNLP 2018. (Citation: 2)

Mroueh, Youssef and Li, Chun-Liang and Sercu, Tom and Raj, Anant and Cheng, Yu. 2018. Sobolev GAN. In Proceedings of ICLR 2018. (Citation: 22)

Fedus, William and Goodfellow, Ian and Dai, Andrew M. 2018. MaskGAN: Better Text Generation via Filling in the_. In Proceedings of ICLR 2018. (Citation: 58)

Li, Jianing and Lan, Yanyan and Guo, Jiafeng and Xu, Jun and Cheng, Xueqi. 2019. Differentiated Distribution Recovery for Neural Text Generation. In Proceedings of AAAI 2019.

Nie, Weili and Narodytska, Nina and Patel, Ankit. 2019. RelGAN: Relational Generative Adversarial Networks for Text Generation. In Proceedings of ICLR 2019. (Citation: 5)

Chen, Francine and Chen, Yan-Ying. 2019. Adversarial Domain Adaptation Using Artificial Titles for Abstractive Title Generation. In Proceedings of ACL 2019.

基于强化学习的方法

Lin, Kevin and Li, Dianqi and He, Xiaodong and Zhang, Zhengyou and Sun, Ming-Ting. 2017. Adversarial Ranking for Language Generation. In Proceedings of NeurIPS 2017. (Citation: 54)

Che, Tong and Li, Yanran and Zhang, Ruixiang and Hjelm, R Devon and Li, Wenjie and Song, Yangqiu and Bengio, Yoshua. 2017. Maximum-Likelihood Augmented Discrete Generative Adversarial Networks. arXiv preprint arXiv:1702.07983. (Citation: 64)

Xu, Jingjing and Zhang, Yi and Zeng, Qi and Ren, Xuancheng and Cai, Xiaoyan and Sun, Xu. 2018. A Skeleton-Based Model for Promoting Coherence Among Sentences in Narrative Story Generation. In Proceedings of EMNLP 2018. (Citation: 4)

Wang, Xin and Chen, Wenhu and Wang, Yuan-Fang and Wang, William Yang. 2018. No Metrics Are Prefect: Adversarial Reward Learning for Visual Storytelling. In Proceedings of ACL 2018. (Citation: 19)

Hjelm, R Devon and Jacob, Athul Paul and Che, Tong and Trischler, Adam and Cho, Kyunghyun and Bengio, Yoshua. 2018. Boundary-Seeking Generative Adversarial Networks. In Proceedings of ICLR 2018. (Citation: 52)

Shi, Zhan and Chen, Xinchi and Qiu, Xipeng and Huang, Xuanjing. 2018. Towards Diverse Text Generation with Inverse Reinforcement Learning. In Proceedings of IJCAI 2018. (Citation: 4)

Subramanian, Sandeep and Mudumba, Sai Rajeswar and Sordoni, Alessandro and Trischler, Adam and Courville, Aaron C and Pal, Chris. 2018. Towards Text Generation with Adversarially Learned Neural Outlines. In Advances in NeurIPS 2018. (Citation: 2)

Huang, Qiuyuan and Gan, Zhe and Celikyilmaz, Asli and Wu, Dapeng and Wang, Jianfeng and He, Xiaodong. 2019. Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation. In Proceedings of AAAI 2019. (Citation: 9)

Kazuma, Hashimoto and Yoshimasa, Tsuruoka. 2019. Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction. In Proceedings of NAACL-HLT 2019.

Chan, Hou Pong and Chen, Wang and Wang, Lu and King, Irwin. 2019. Neural Keyphrase Generation via Reinforcement Learning with Adaptive Rewards. In Proceedings of ACL 2019.

基于知识方法

Liu, Hugo and Singh, Push. 2002. MAKEBELIEVE: Using Commonsense Knowledge to Generate Stories. In Proceedings of AAAI 2002. (Citation: 86)

Yang, Bishan and Mitchell, Tom. 2017. Leveraging Knowledge Bases in LSTMs for Improving Machine Reading. In Proceedings of ACL 2017. (Citation: 36)

Ghazvininejad, Marjan and Brockett, Chris and Chang, Ming-Wei and Dolan, Bill and Gao, Jianfeng and Yih, Wen-tau and Galley, Michel. 2018. A Knowledge-Grounded Neural Conversation Model. In Proceedings of AAAI 2018. (Citation: 61)

Li, Qian and Li, Ziwei and Wei, Jin-Mao and Gu, Yanhui and Jatowt, Adam and Yang, Zhenglu. 2018. A Multi-Attention Based Neural Network with External Knowledge for Story Ending Predicting Task. In Proceedings of COLING 2018. (Citation: 4)

Jian Guan, Yansen Wang and Minlie Huang. 2019. Story Ending Generation with Incremental Encoding and Commonsense Knowledge. In Proceedings of AAAI 2019. (Citation: 3)

Chen, Jiaao and Chen, Jianshu and Yu, Zhou. 2019. Incorporating Structured Commonsense Knowledge in Story Completion. In Proceedings of AAAI 2019.

Shang, Mingyue and Fu, Zhenxin and Yin, Hongzhi and Tang, Bo and Zhao, Dongyan and Yan, Rui. 2019. Find a Reasonable Ending for Stories: Does Logic Relation Help the Story Cloze Test?. In Student Abstract of AAAI 2019.

Li, Christy Y. and Liang, Xiaodan and Hu, Zhiting and Xing, Eric P.. 2019. Knowledge-Driven Encode, Retrieve, Paraphrase for Medical Image Report Generation. In Proceedings of AAAI 2019. (Citation: 4)

Koncel-Kedziorshi, Rik and Bekal, Dhanush and Luan, Yi and Lapata, Mirella and Hajishirzi, Hannaneh. 2019. Text Generation from Knowledge Graphs with Graph Transformers. In Proceedings of NAACL-HLT 2019.

Valerie, Hajdik and Jan, Buys and Michael W., Goodman and Emily M., Bender. 2019. Neural Text Generation from Rich Semantic Representations. In Proceedings of NAACL-HLT 2019.

Yang, Pengcheng and Luo, Fuli and Chen, Peng and Li, Lei and Chang, Baobao and Sui, Zhifang and Sun, Xu. 2019. Knowledgeable Storyteller: A Commonsense-Driven Generative Model for Visual Storytelling. In Proceedings of IJCAI 2019.

Yang, Pengcheng and Li, Lei and Luo, Fuli and Liu, Tianyu and Sun, Xu. 2019. Enhancing Topic-to-Essay Generation with External Commonsense Knowledge. In Proceedings of ACL 2019.

风格迁移

Hu, Zhiting and Yang, Zichao and Liang, Xiaodan and Salakhutdinov, Ruslan and Xing, Eric P.. 2017. Toward Controlled Generation of Text. In Proceedings of ICML 2017. [code] (Citation: 179)

Shen, Tianxiao and Lei, Tao and Barzilay, Regina and Jaakkola, Tommi. 2017. Style Transfer from Non-Parallel Text by Cross-Alignment. In Proceedings of NeurIPS 2017. [code](Citation: 123)

Han, Mengqiao and Wu, Ou and Niu, Zhendong. 2017. Unsupervised Automatic Text Style Transfer using LSTM. In Proceedings of NLPCC 2017. (Citation: 5)

Li, Juncen and Jia, Robin and He, He and Liang, Percy. 2018. Delete, retrieve, generate: A simple approach to sentiment and style transfer. In Proceedings of NAACL-HLT 2018. [code] (Citation: 53)

Zhang, Ye and Ding, Nan and Soricut, Radu. SHAPED: Shared-Private Encoder-Decoder for Text Style Adaptation. In Proceedings of NAACL-HLT 2018. (Citation: 9)

Prabhumoye, Shrimai and Tsvetkov, Yulia and Salakhutdinov, Ruslan and Black, Alan W. 2018. Style Transfer Through Back-Translation. In Proceedings of ACL 2018. [code] (Citation: 47)

Xu, Jingjing and Sun, Xu and Zeng, Qi and Ren, Xuancheng and Zhang, Xiaodong and Wang, Houfeng and Li, Wenjie. 2018. Unpaired sentiment-to-sentiment translation: A cycled reinforcement learning approach. In Proceedings of ACL 2018. [code] (Citation: 21)

Santos, Cicero Nogueira dos and Melnyk, Igor and Padhi, Inkit. 2018. Fighting offensive language on social media with unsupervised text style transfer. In Proceedings of ACL 2018. (Citation: 9)

Yang, Zichao and Hu, Zhiying and Dyer, Chris and Xing, Eric P. and Berg-Kirkpatrick, Taylor. 2018. Unsupervised Text Style Transfer using Language Models as Discriminators. In Proceedings of NeurIPS 2018. (Citation: 31)

Zhang, Zhirui and Ren, Shuo and Liu, Shujie and Wang, Jianyong and Chen, Peng and Li, Mu and Zhou, Ming and Chen, Enhong. 2018. Style Transfer as Unsupervised Machine Translation. arXiv preprint arXiv:1808.07894. (Citation: 5)

Gong, Hongyu and Bhat, Suma and Wu, Lingfei and Xiong, Jinjun and Hwu, Wen-mei. 2019. Reinforcement Learning Based Text Style Transfer without Parallel Training Corpus. In Proceedings of NAACL-HLT 2019. (Citation: 1)

Luo, Fuli and Li, Peng and Zhou, Jie and Yang, Pengcheng and Chang, Baobao and Sui, Zhifang and Sun, Xu. 2019. A Dual Reinforcement Learning Framework for Unsupervised Text Style Transfer. In Proceedings of IJCAI 2019. [code] (Citation: 3)

Lee, Joseph and Xie, Ziang and Wang, Cindy and Drach, Max and Jurafsky, Dan and Ng, Andrew Y. 2019. Neural Text Style Transfer via Denoising and Reranking. In Proceedings of ACL 2019 Workshop.