数据集(三)|人工智能领域100+数据集分享,赶紧收藏!

学习数据分析需要持续进行实操,但很多读者找不到合适的数据集来练手,小编整理了人工智能领域100+数据集,总有一个是适合你练手的数据集!赶紧收藏点赞吧!
01、NLP语料库数据集
1.2016-2019新闻联播语料库(11.3MB)
https://www.heywhale.com/mw/dataset/5d2d344c688d36002c5da8e5
2.中文谣言语料库(32.6MB)
https://www.heywhale.com/mw/dataset/5d257f87688d36002c579342
3.中国对联数据集(28.2MB)
https://www.heywhale.com/mw/dataset/5c46e6f42d8ef5002b736d6d
4.1998人民日报标注语料库(PFR)(10.2MB)
https://www.heywhale.com/mw/dataset/5ce7983cd10470002b334de3
5.人民日报文章数据集(1979-2010)(811.9MB)
https://www.heywhale.com/mw/dataset/5c862b1ad635ff002ca2eb19
6.人民日报文章数据集(1949-1978)(559.4MB)
https://www.heywhale.com/mw/dataset/5c8626031e7104002b380a4b
7.中文新闻数据集(70.3MB)
https://www.heywhale.com/mw/dataset/5d8878638499bc002c1148f7
8.耶鲁文本转SQL语句挑战数据集(95.1MB)
https://www.heywhale.com/mw/dataset/5d48f322c143cf002bf36319
9.新加坡国立大学SMS语料库(23.4MB)
https://www.heywhale.com/mw/dataset/5d3ea76acf76a600361e9aa0
10.中文经典典籍语料
https://www.heywhale.com/mw/dataset/5d11e717708b90002c4d2983
11.非正式汉语数据集(214.5MB)
https://www.heywhale.com/mw/dataset/5d1c45459f53a9002ce35b61
12.维基百科中文语料库(518.7MB)
https://www.heywhale.com/mw/dataset/5d1ee7939f53a9002ce5910e
13.频率最高的9933个最常用汉字数据集(1.0MB)
https://www.heywhale.com/mw/dataset/5d8dd076037db3002d3a715c
14.聊天语料库数据集(210.7MB)
https://www.heywhale.com/mw/dataset/5dee1459953ca8002c9678a6
15.短文本分类数据集(13.1MB)
https://www.heywhale.com/mw/dataset/5dd645fca0cb22002c94e65d/file
16.成语阅读理解数据集(195.8MB)
https://www.heywhale.com/mw/dataset/5ddf91e8ca27f8002c4ad48d
17.论文自动评分数据集(78.8MB)
https://www.heywhale.com/mw/dataset/5de0c5ccca27f8002c4b178a
18.翻译语料(595.9MB)
https://www.heywhale.com/mw/dataset/5de5fcafca27f8002c4ca993
19.中文科学文献摘要数据集(92.9MB)
https://www.heywhale.com/mw/dataset/5de72db2ca27f8002c4ce7b4
20.维基百科英文语料库(89.0MB)
https://www.heywhale.com/mw/dataset/5ddba2c9f41512002cebfef6
21.Lord of the Rings指环王数据(223.9KB)
https://www.heywhale.com/mw/dataset/5da83b27c83fb400420c5707
22.中文机器阅读理解的跨度提取数据集(CMRC 2018)
https://www.heywhale.com/mw/dataset/5e7b180798d4a8002d2d3af6
23.36氪新闻数据集(42.5MB)
https://www.heywhale.com/mw/dataset/5eb68e91366f4d002d77d08d
24.1万条亚马逊乐器的评测/评论(13MB)
https://www.heywhale.com/mw/dataset/5e980ce4ebb37f002c5feccc
25.1万条互联网专栏资讯数据集(75.7MB)
https://www.heywhale.com/mw/dataset/5ebba2de0bff1b002ce6d6a7
26.2万条中文金融新闻数据集(66.6MB)
https://www.heywhale.com/mw/dataset/5eb69242366f4d002d77d2b7
27.中文图书分类数据集(49.8MB)
https://www.heywhale.com/mw/dataset/5ecf5a25162df90036ddec65
28.英文歌词数据集(69.1MB)
https://www.heywhale.com/mw/dataset/5aab8085afaabd5e93e4e027
29.特朗普政府发表的声明和简报(63.6MB)
https://www.heywhale.com/mw/dataset/5fae515f7d1e6d0030d68088
02、问答类数据集
1.金融行业问答数据集(245.5MB)
https://www.heywhale.com/mw/dataset/5e9588f8e7ec38002d0331b1
2.社区问答数据集(1.7GB)
https://www.heywhale.com/mw/dataset/5de601f3ca27f8002c4cac47
3.中文医学问答数据集(85MB)
https://www.heywhale.com/mw/dataset/5d313070cf76a60036e4b023
4.CNN 新闻文章中的 12 万个问答对数据集(17.3MB)
https://www.heywhale.com/mw/dataset/5eef1408caa99b002d6e37cc
03、情感分析类数据集
1.斯坦福情绪树库:带有情感注释的标准情绪数据集(6.1MB)
https://www.heywhale.com/mw/dataset/5daa748c1035d8002c35cdee
2.关于美国的航空公司的推特的情绪分析数据集(2.6MB)
https://www.heywhale.com/mw/dataset/5dab23781035d8002c3634c9
3.中文对话情绪语料(1.1MB)
https://www.heywhale.com/mw/dataset/5d00c390e727f8002c4599ad
4.多域情感数据集(51.2MB)
https://www.heywhale.com/mw/dataset/5de5ce0aca27f8002c4c9ee8
5.sentiment140 情感分析数据集(72.6KB)
https://www.heywhale.com/mw/dataset/5ca46f1a8408c1002b498cca
04、爬虫类数据集
1.6000条周杰伦微博超话数据!(1.1MB)
https://www.heywhale.com/mw/dataset/5d3551bdcf76a60036f605aa
2.《中餐厅3》19W弹幕数据(12.8MB)
https://www.heywhale.com/mw/dataset/5d7b69798499bc002c0d3ec5
3.bilibili流行动漫影评数据(2.3MB)
https://www.heywhale.com/mw/dataset/5d3a76dfcf76a600360e19c9
4.淘宝某店铺电风扇评论(273.9KB)
https://www.heywhale.com/mw/dataset/5d442caec143cf002bdb687c
5.7K条马蜂窝国内热门景点游记(140+MB)
https://www.heywhale.com/mw/dataset/5e7c55db98d4a8002d2db5a2
6.IMDB电影评论数据(32.0MB)
https://www.heywhale.com/mw/dataset/5d143d41708b90002c5f7021
7.未名BBS热门话题(3.6MB)
https://www.heywhale.com/mw/dataset/5dad84b375df5c002b20d79f
8.咪蒙所有公众号文章(3.9MB)
https://www.heywhale.com/mw/dataset/5c8723441e7104002b3831d3
9.6000条周杰伦微博超话数据(1.1MB)
https://www.heywhale.com/mw/dataset/5d3551bdcf76a60036f605aa
10.麦当劳就餐负面评论数据集(891.1KB)
https://www.heywhale.com/mw/dataset/5dab2c0b1035d8002c36372b
05、实体识别类数据集
1.用于命名实体识别的带注释语料库(26.4MB)
https://www.heywhale.com/mw/dataset/5de9be34953ca8002c95c35f
2.使用LatticeLSTM的中文NER数据(191.5KB)
https://www.heywhale.com/mw/dataset/5d564ea0c143cf002b235181
3.医疗命名实体识别数据集(5.1MB)
https://www.heywhale.com/mw/dataset/5dedef59953ca8002c96667a
4.中文实体关系抽取数据集(8.1MB)
https://www.heywhale.com/mw/dataset/5dde487dca27f8002c4a8352
5.金融信息负面及主体判定比赛数据集(17MB)
https://www.heywhale.com/mw/dataset/5e09a9eb2823a10036b126c0
06、CV类数据集
1.Pronto共享单车数据集(70.8MB)
https://www.heywhale.com/mw/dataset/58a515c48460306efcce2e96
1.Fashion-MNIST图像数据集(200.4MB)
https://www.heywhale.com/mw/dataset/5a0cfcf860680b295c28a753
2.CIFAR100数据集(161.3MB)
https://www.heywhale.com/mw/dataset/5e96da12e7ec38002d03bf51
3.车辆数据集(车辆识别与分类)(62.5MB)
https://www.heywhale.com/mw/dataset/5bc316173631bc00109d2abf
4.垃圾分类数据集
https://www.heywhale.com/mw/dataset/5d133d11708b90002c570588
5.另一个垃圾分类数据集(40.9MB)
https://www.heywhale.com/mw/dataset/5d1578e4708b90002c6a3238
6.CIFAR10数据集(148MB)
https://www.heywhale.com/mw/dataset/5ab3403bfdf6b86c23f259e3
7.GTSRB-德国交通标志识别图像数据(253.3MB)
https://www.heywhale.com/mw/dataset/5d8db5ca037db3002d3a5ba0
8.手势识别数据库(1.1GB)
https://www.heywhale.com/mw/dataset/5d8da999037db3002d3a523
9.情绪的面部表情(170MB+)
https://www.heywhale.com/mw/dataset/5d773bd68499bc002c0c4a6f
10.枪支目标检测(2.4MB)
https://www.heywhale.com/mw/dataset/5d576984c143cf002b238528
11.人脸图像数据(294.1MB)
https://www.heywhale.com/mw/dataset/5da6d4cac83fb4004206edf0
12.RMFD口罩遮挡人脸数据集(610.3MB)
https://www.heywhale.com/mw/dataset/5e81a24b246a590036b884d5
13.中国交警手势数据集(1.8GB)
https://www.heywhale.com/mw/dataset/5de75df5ca27f8002c4cf1bb
14.场景分类数据集(105.9MB)
https://www.heywhale.com/mw/dataset
数据集(三)|人工智能领域100+数据集分享,赶紧收藏!
声明:除非特别标注,否则均为本站原创文章,转载时请以链接形式注明文章出处。如若本站内容侵犯了原著者的合法权益,可联系本站删除。



