博客信息

Selenium解析百度云链接

发布时间:『 2019-10-22 05:13』  博客类别:其他框架  阅读(833)

本篇博客的目标是利用selenium自动化技术,将百度云链接内部包含的信息内容生成列表;

 

有点类似于这篇博客的最终结果

http://www.javaxl.com/blog/articles/220

 

 

实现思路

1、从数据库中读取到要解析的某一个百度云分享链接以及提取码,然后通过selenium技术,模拟打开浏览器,填写百度云分享链接以及提取码,然后模拟点击提取资源按钮;接下来就开始解析百度云分享链接内部的信息了

2、首先我们需要通过样式判断用户是分享的目录还是文件,如果是文件,那么很好处理,直接就可以提取到文件的文件名,都无需递归

3、如果是目录,那么需要处理的目录个数加1;如果是第一层级,catalogNumber值肯定是1;如果是第二三层级,catalogNumber会根据实际遍历层次目录数依次+1,而文件会添加到fileEleList,文件名会追加到treeInfo

4、追加到fileEleList容器中的文件,实际上是已经被selenium中处理过的,我们需要将其从elements中移除,那么elements剩下的元素都是没有被selenium处理的

5、如果没有目录要处理了,那么利用selenium模拟返回上一层。如果当前层级还有目录要处理,那么将当前层级处理的状态信息添加到allTreeInfo中;

6、allTreeInfo获取要处理的treeLevel,先将其信息添加到treeInfo,然后对这个目录进行递归处理

7、当第三层级的最后一个目录处理完,selenium会模拟返回上一级,到第二层级的最后一个目录,selenium继续模拟返回上一级,到第一层级,最后整个递归结束,程序也就结束了;

 

 

核心代码中的重要变量含义:

treeInfo:存放最后打印信息的可变字符串容器;

allElement:百度云分享连接内部行信息对象(文件名、大小、修改日期...

作用:可以用来区分,这一行是目录还是文件

 

elements百度云分享连接内部第一列信息对象(只有文件名)

catalogNumber:用来记录当前层级中目录的个数

fileEleList:用来存放,已经被selenium解析过的文件的容器

allTreeInfo:用来存放,每一层级元素遍历的状态

 

所需实体类(用来标记被selenium正在处理的节点对象)

package com.javaxl.selenium.entity;

import java.util.List;

/**
 * @author 小李飞刀
 * @site www.javaxl.com
 * @company
 * @create  2019-10-17 15:37
 *
 * level1
 *      level2.1
 *          level2.1_3.1
 *          level2.1_3.2
 *      level2.2
 *          level2.2_3.1
 *          level2.2_3.2
 *
 *  当前处理第一层级(level1)
 *      currentIndex:0
 *      levelCatalog:level1
 *
 *  当前处理第二层级:
 *  当前处理的TreeLevel是:level2.1
 *      currentIndex:0
 *      levelCatalog:
 *          level2.1
 *          level2.2
 *  当前处理的TreeLevel是:level2.2
 *      currentIndex:1
 *      levelCatalog:
 *          level2.1
 *          level2.2
 *
 *  当前处理第三层级:
 *  当前处理的TreeLevel是:level2.1_3.1
 *      currentIndex:0
 *      levelCatalog:
 *          level2.1_3.1
 *          level2.1_3.2
 *  当前处理的TreeLevel是:level2.1_3.2
 *      currentIndex:1
 *      levelCatalog:
 *          level2.1_3.1
 *          level2.1_3.2
 *
 *   ......
 *
 */
public class TreeLevel {
    private Integer currentIndex; // 遍历层次(第一二三层级)被处理的目录所在索引
    private List<String> levelCatalog; // 遍历层次(第一二三层级)的所有目录的节点名称

    public TreeLevel() {
    }

    public TreeLevel(Integer currentIndex, List<String> levelCatalog) {
        this.currentIndex = currentIndex;
        this.levelCatalog = levelCatalog;
    }

    public Integer getCurrentIndex() {
        return currentIndex;
    }

    public void setCurrentIndex(Integer currentIndex) {
        this.currentIndex = currentIndex;
    }

    public List<String> getLevelCatalog() {
        return levelCatalog;
    }

    public void setLevelCatalog(List<String> levelCatalog) {
        this.levelCatalog = levelCatalog;
    }
}


核心代码:

package com.javaxl.selenium.test;

import com.javaxl.selenium.entity.Article;
import com.javaxl.selenium.entity.TreeLevel;
import com.javaxl.selenium.utils.StringUtil;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
import org.openqa.selenium.support.ui.ExpectedCondition;
import org.openqa.selenium.support.ui.WebDriverWait;

import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;

/**
 * @author 小李飞刀
 * @site www.javaxl.com
 * @company
 * @create  2019-10-17 15:46
 *
 * 给个百度云链接,最多列出其中三层节点信息
 */
public class SeleniumTest3Level {
    private static Integer currentLevel=1; // 当前处理的层次

    private static StringBuffer treeInfo=null; // 树形结构内容信息

    private static Map<String, TreeLevel> allTreeInfo=new HashMap<String,TreeLevel>(); // 记录所有层次的所有结构信息

    private static boolean forward=true; // 执行方向


    public static void main(String[] args) {
        SeleniumTest3Level test3Level=new SeleniumTest3Level();
        String shareUrl="https://pan.baidu.com/s/1vcYHHlG4izblfsqMZ_WsMw";
        String password="rtg5";
        System.setProperty("webdriver.chrome.driver", "D:\\initPath\\chromedriver.exe");

        Map<String,Object> preferences=new HashMap<String,Object>();
        ChromeOptions options=new ChromeOptions();
        preferences.put("profile.managed_default_content_settings.images",2);
        options.setExperimentalOption("prefs",preferences);
        WebDriver driver=new ChromeDriver(options);
        driver.get(shareUrl);

        WebDriverWait wait = new WebDriverWait(driver, 5);
        wait.until(new ExpectedCondition<Boolean>() {
            public Boolean apply(WebDriver d) {
                boolean loadcomplete = d.findElement(By.tagName("body")).isDisplayed();
                return loadcomplete;
            }
        });

        boolean hasPassword=false; // 是否有密码
        String title = driver.findElement(By.cssSelector(".pickpw.clearfix")).getText();
        if(StringUtil.isNotEmpty(title) && title.contains("请输入提取码")){
            hasPassword=true;
        }
        if(hasPassword){
            WebElement pInput = driver.findElement(By.cssSelector(".QKKaIE.LxgeIt"));
            WebElement btn = driver.findElement(By.cssSelector(".g-button-right"));
            pInput.sendKeys(password);
            btn.click();
        }

        Article article=new Article();
        article.setShare_url(shareUrl);
        article.setPassword(password);
        try {
//            这里才真正开始去提取百度云链接内部的信息
            test3Level.generate3Level(driver,article);
        } catch (Exception e) {
            e.printStackTrace();
        }
//        System.out.println(article);
        driver.close(); // 浏览器关闭
        driver.quit(); // 释放资源
    }

    /**
     * 生成三层结构的树结构资源内容信息
     * @param driver
     * @param article
     * @throws Exception
     */
    public static void generate3Level(WebDriver driver, Article article)throws Exception{

        treeInfo=new StringBuffer();
        try {
            Thread.sleep(2000);
//            只有能够加载出文件夹的才会出现这个样式,这行代码才不会出现异常
            WebElement element = driver.findElement(By.cssSelector(".EgMMec"));
//            System.out.println("是目录");
            dealCatalog(driver);
        }catch(Exception e){
//            System.out.println("是文件");
            dealFile(driver);
        }
    }

    /**
     * 处理文件
     */
    public static void dealFile(WebDriver driver){
        WebDriverWait wait = new WebDriverWait(driver, 5);
        wait.until(new ExpectedCondition<Boolean>() {
            public Boolean apply(WebDriver d) {
                boolean loadcomplete = d.findElement(By.cssSelector(".file-name")).isDisplayed();
                return loadcomplete;
            }
        });
        WebElement fileNameEle = driver.findElement(By.cssSelector(".file-name"));
        treeInfo.append(fileNameEle.getText());
    }

    /**
     * 打印层次
     */
    private static void printLine(int n) {
        for(int i=2;i<=n;i++) {
            if(i<=n-1){
                System.out.print("    ");
                treeInfo.append("         ");
            }else{
                System.out.print("|____");
                treeInfo.append("|____");
            }
        }
    }

    private static boolean backParent(WebDriver driver){
        if(currentLevel==1) {
            return true;
        }else{
            --currentLevel;
            forward=false;
            driver.navigate().back();
            dealCatalog(driver);
        }
        return false;
    }

    /**
     * 处理目录
     */
    public static void dealCatalog(WebDriver driver) {
        WebDriverWait wait = new WebDriverWait(driver, 5);
        try {
            wait.until(new ExpectedCondition<Boolean>() {
                public Boolean apply(WebDriver d) {
                    boolean loadcomplete = d.findElement(By.cssSelector(".g-clearfix.AuPKyz")).isDisplayed();
                    return loadcomplete;
                }
            });
        } catch (Exception e) {
            allTreeInfo.remove(String.valueOf(currentLevel));
//            超时处理,返回上一层
            if (backParent(driver)) return;
            return;
        }
//        所有的文件夹及文件元素,那一行对应的所有信息
        List<WebElement> allElement = driver.findElements(By.cssSelector(".g-clearfix.AuPKyz"));
//        某一行对应的文件或文件夹名称
        List<WebElement> elements = driver.findElements(By.cssSelector(".filename"));

//        有几个目录需要被遍历
        int catalogNumber = 0; // 目录数据
//        fileEleList存放所有等待遍历的文件夹目录
        List<WebElement> fileEleList = new LinkedList<WebElement>();
        for (int i = 0; i < allElement.size(); i++) {
            WebElement webElement = allElement.get(i);
            WebElement element = null;
            try {
//                能获取到,说明是目录,不能获取到,说明是文件
                element = webElement.findElement(By.cssSelector(".JS-fileicon.dir-small"));
            } catch (Exception e) {
            } finally {
//                是文件或到了目录遍历的第三层
                if (element == null || currentLevel == 3) {
//                    左侧单个文件对象
                    WebElement webElement1 = elements.get(i);
                    if (forward || currentLevel == 3) {
                        printLine(currentLevel);
                        System.out.println(webElement1.getText());
                        treeInfo.append(webElement1.getText() + "<br/>");
                    }
                    fileEleList.add(webElement1);
                } else {
                    catalogNumber++;
                }
                continue;
            }
        }

        for (WebElement ele : fileEleList) {
            elements.remove(ele);
        }

        if (catalogNumber == 0 && currentLevel == 1) {
            return;
        }

        if (catalogNumber == 0) {
            allTreeInfo.remove(String.valueOf(currentLevel));
            --currentLevel;
            forward = false;
            driver.navigate().back();
            dealCatalog(driver);
        } else {
            if (allTreeInfo.get(String.valueOf(currentLevel)) == null) {
                List<String> allInfo = new LinkedList<String>();
                for (WebElement e : elements) {
                    String text = e.getText();
                    allInfo.add(text);
                }
//               某一层级,存放所有节点信息,包括节点的索引(从0开始),节点的名称
                allTreeInfo.put(String.valueOf(currentLevel), new TreeLevel(0, allInfo));
            }
        }

        TreeLevel treeLevel = allTreeInfo.get(String.valueOf(currentLevel));
        if (treeLevel == null) {
            return;
        }
        Integer currentIndex = treeLevel.getCurrentIndex();
        if(currentIndex<elements.size()){
            WebElement webElement = elements.get(currentIndex);
            printLine(currentLevel);
            treeInfo.append(webElement.getText()+"<br/>");
            System.out.println(webElement.getText());
            String winHandleBefore = driver.getWindowHandle();
            webElement.click();
//            for(String winhandle:driver.getWindowHandles()){
//                if(winhandle.equals(winHandleBefore)){
//                    continue;
//                }
//                driver.switchTo().window(winhandle);
//                break;
//            }
            treeLevel.setCurrentIndex(currentIndex+1);
            allTreeInfo.put(String.valueOf(currentLevel),treeLevel);
            ++currentLevel;
            forward=true;
            dealCatalog(driver);
        }else{
            allTreeInfo.remove(String.valueOf(currentLevel));
            if(currentLevel==1){
                return;
            }else{
                --currentLevel;
                forward=false;
                driver.navigate().back();
                dealCatalog(driver);
            }
        }
    }
}


稍微改改用到页面上去

package com.javaxl.selenium.test;

import com.javaxl.selenium.entity.Article;
import com.javaxl.selenium.entity.TreeLevel;
import com.javaxl.selenium.utils.StringUtil;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
import org.openqa.selenium.support.ui.ExpectedCondition;
import org.openqa.selenium.support.ui.WebDriverWait;

import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

/**
 * @author 小李飞刀
 * @site www.javaxl.com
 * @company
 * @create  2019-10-22 15:42
 */
@WebServlet("/yunParse")
public class SeleniumServlet extends HttpServlet {

    @Override
    protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
        doPost(req,resp);
    }

    @Override
    protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
        SeleniumTest3Level test3Level=new SeleniumTest3Level();
//        String shareUrl="https://pan.baidu.com/s/1vcYHHlG4izblfsqMZ_WsMw";
//        String password="rtg5";
        String shareUrl=req.getParameter("shareUrl");
        String password=req.getParameter("password");
        if(StringUtil.isEmpty(shareUrl) || StringUtil.isEmpty(password)){
            req.getRequestDispatcher("index.jsp").forward(req,resp);
            return;
        }
        System.setProperty("webdriver.chrome.driver", "D:\\initPath\\chromedriver.exe");

        Map<String,Object> preferences=new HashMap<String,Object>();
        ChromeOptions options=new ChromeOptions();
        preferences.put("profile.managed_default_content_settings.images",2);
        options.setExperimentalOption("prefs",preferences);
        WebDriver driver=new ChromeDriver(options);
        driver.get(shareUrl);

        WebDriverWait wait = new WebDriverWait(driver, 5);
        wait.until(new ExpectedCondition<Boolean>() {
            public Boolean apply(WebDriver d) {
                boolean loadcomplete = d.findElement(By.tagName("body")).isDisplayed();
                return loadcomplete;
            }
        });

        boolean hasPassword=false; // 是否有密码
        String title = driver.findElement(By.cssSelector(".pickpw.clearfix")).getText();
        if(StringUtil.isNotEmpty(title) && title.contains("请输入提取码")){
            hasPassword=true;
        }
        if(hasPassword){
            WebElement pInput = driver.findElement(By.cssSelector(".QKKaIE.LxgeIt"));
            WebElement btn = driver.findElement(By.cssSelector(".g-button-right"));
            pInput.sendKeys(password);
            btn.click();
        }

        Article article=new Article();
        article.setShare_url(shareUrl);
        article.setPassword(password);
        String treeInfo = null;
        try {
//            这里才真正开始去提取百度云链接内部的信息
            treeInfo = test3Level.generate3Level(driver, article);
        } catch (Exception e) {
            e.printStackTrace();
        }
        req.setAttribute("msg",treeInfo);
        req.setAttribute("shareUrl",shareUrl);
        req.setAttribute("password",password);
        req.getRequestDispatcher("index.jsp").forward(req,resp);
        driver.close(); // 浏览器关闭
        driver.quit(); // 释放资源
    }
}


Index.jsp页面

index.jsp 


最终运行结果

小李飞刀_selenium



over......


关键字:     自动化测试       selenium  

备案号:湘ICP备19000029号

Copyright © 2018-2019 javaxl晓码阁 版权所有