千呼万唤始出来,《PNAS》绘图获取的代码来啦,不过这次研究了半天也没想到如何获取付费文章的绘图,就只下载了免费文章(主要也怕侵权),不过光免费文章的图片三年了也有接近1.7w张了,同时使用代码下载时依旧需要科学上网,因此还是建议大家直接去文末下载我整理好的图片压缩包。
代码也放一下叭,使用方法就命令行运行 getPNASJPG(YEAR) YEAR 为该期刊的年份,例如getPNASJPG(2022),要是出现了啥403的报错,过段时间再运行应该就会自己好起来。。代码如下:
function getPNASJPG(YEAR)
if nargin < 1
YEAR = 2023;
end
YEAR = num2str(YEAR);
str_YEAR = ['d',YEAR(1:3),'0','.y',YEAR];
options = weboptions('Timeout',inf);
url_archive = ['https://www.pnas.org/loi/pnas/group/',str_YEAR];
html_archive = webread(url_archive,options);
A_issue = strfind(html_archive,'past-issue__content__item--all-details d-flex flex-column');
str_issue = html_archive(A_issue(1)+50:A_issue(1)+100);
S1_issue = strfind(str_issue,'|');
S2_issue = strfind(str_issue,'</h2>');
str1_issue = str_issue(S1_issue(1):S1_issue(2));
str2_issue = str_issue(S1_issue(2):S2_issue);
num1_issue = str2num(str1_issue(str1_issue>=48&str1_issue<=57));
num2_issue = str2num(str2_issue(str2_issue>=48&str2_issue<=57));
ibegin = 1; jbegin = 1; kbegin = 1;
forderName=['Year_',num2str(YEAR)];
if exist(['.\image_',forderName,'\ijkbreak.mat'],'file')
load(['.\image_',forderName,'\ijkbreak.mat']);
end
if ~exist(['.\image_',forderName],'dir')
mkdir(['.\image_',forderName]);
end
disp([ibegin,jbegin,kbegin])
for i = ibegin:num2_issue
url_issue = ['https://www.pnas.org/toc/pnas/',num2str(num1_issue),'/',num2str(i)];
html_issue = webread(url_issue,options);
A_article = strfind(html_issue,'Research Article');
Z_article = strfind(html_issue,'Recent Issues');
html_issue = html_issue(A_article(1):Z_article(1));
B_article = strfind(html_issue,'icon-open-access');
A_article = strfind(html_issue,'text-reset animation-underline');
Z_article = strfind(html_issue,'title="');
for j = jbegin:length(B_article)
tA_article = A_article(find(B_article(j)<A_article,1));
url_article = html_issue(tA_article:Z_article(find(Z_article>tA_article,1)));
url_article = url_article(39:end-3);
url_article = ['https://www.pnas.org',url_article];
html_article = webread(url_article,options);
A_JPG = strfind(html_article,[url_article(find(url_article=='/',1,'last'):end),'/asset/']);
Z_JPG = strfind(html_article,'jpg" height=');
for k = kbegin:length(A_JPG)
try
ibegin = i ; jbegin = j; kbegin = k;
save(['.\image_',forderName,'\ijkbreak.mat'],'ibegin','jbegin','kbegin')
url_JPG = ['https://www.pnas.org/cms/10.1073',html_article(A_JPG(k):Z_JPG(k)+2)];
name_JPG = ['.\image_',forderName,'\',url_JPG(find(url_JPG=='/',1,'last')+1:end)];
websave(name_JPG,url_JPG,options);
disp(['Downloading Year-',YEAR,...
' Issue-',num2str(i),' Artical-',num2str(j),...
' Pic-',num2str(k),':',url_article(22:end)])
catch
end
end
kbegin = 1;
end
jbegin = 1;
end
end
代码设置了可断点下载,就是可以下载了一半中断程序后过段时间接着下。
同时如果有的时候看到一张图非常好想找找源文章读一读,此代码下载的图像名称就标注了图像的来源,比如对下图名为pnas.2212633120fig06的图感兴趣:
只需要在浏览器输入文章链接:
- https://www.pnas.org/doi/10.1073/pnas.2212633120
确实就是Fig.6,完全对的上!
部分图像展示
《PNAS》上的图画的好的和画的差的就差别比较大了,大家有选择的学习哈,这里展示部分比较有趣的绘图:
2023
2022
2021
图像获取
百度网盘
提供近三年来图片百度网盘链接,共计约1.7w张:
2023(2.49G-3209张)
链接:
https://pan.baidu.com/s/1YxRmt53jH-_TXGg6zkqtIg?pwd=slan
提取码:slan
2022 上(3.12G-3329张)
链接:
https://pan.baidu.com/s/1vFcEy48oOklW9UOUShVeAA?pwd=slan
提取码:slan
2022 下(3.02G-3359张)
链接:
https://pan.baidu.com/s/1ItVAmS18DcwlCNsM2u5rwg?pwd=slan
提取码:slan
2021 上(2.61G-3077张)
链接:
https://pan.baidu.com/s/1XHYlxR9_s1Ly9LCtlfnrhQ?pwd=slan
提取码:slan
2021 下(3.35G-3887张)
链接:
https://pan.baidu.com/s/1uCUoi_hUUKlZ3kfc2oI4Yw?pwd=slan
提取码:slan
gitee仓库
若网盘失效,可去gitee仓库获取最新网盘链接:
https://gitee.com/slandarer/pnas-figures