需求描述
一个 canvas 展示 思维导图,类似 https://echarts.apache.org/examples/en/editor.html?c=tree-basic ,就当他是个 MDN Menu 吧 🤣
- 鼠标左键
click
node to control collapse & expand - 鼠标右键
contextmenu
node to nav to respective page
成品展示 http://hojondo.com/MDN_MIND_MAPPING/
待完善 + filter + search + nav
目标 JSON 格式
interface PageNode {
name: string;
link: string;
childrenPage?: Array<PageNode>;
}
例
[
{
"name": "Standard built-in objects",
"link": "https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects",
"childrenPage": [
{
"name": "Text processing",
"link": "https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects#text_processing",
"childrenPage": [
{
"name": "String",
"link": "https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String",
"childrenPage": [
{
"name": "Static properties",
"link": "",
"childrenPage": []
},
{
"name": "Static methods",
"link": "",
"childrenPage": []
},
{
"name": "Instance properties",
"link": "",
"childrenPage": []
},
{
"name": "Instance methods",
"link": "",
"childrenPage": [
{
"name": "match",
"link": "颗粒度 到 properties // 其实后续可以追加 #parameters #return #examples/特别注意用例,待定。。。"
},
{}
]
}
]
},
{
"name": "RegExp",
"link": "https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp"
}
]
},
{
"name": "Keyed collections",
"link": "https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects#keyed_collections",
"childrenPage": [
// Map,
// Set,
// WeakMap,
// WeakSet
]
}
]
}
// https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference#statements
]
梳理 MDN 每页的 layout 结构
root 选定 https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects
结构相对统一
bread-crumb-length === 6
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference
url 判断依据: (url.match(/(?<=^https\:\/\/).*/)??[''])[0].split('/').length === 6
h2 > a {
// 指向 bread-crumb-length === 7
/** 包括
Global_Objects;
statements;
expressions_and_operators;
functions;
additional_reference_pages
*/
}
只有 5 个 h2,而且 从该入口进的子页面 结构并不一致。
暂略
bread-crumb-length === 7
例 https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects
url 判断依据: (url.match(/(?<=^https\:\/\/).*/)??[''])[0].split('/').length === 7
所有 h3 分类 及其 子页面
h3 > a {
// 大分类 hash#url 例 #text_processing
}
h3 + div li > a {
// 大分类 的 子页面
// TODO: 排除 a 前面带 svg 的(要么 nonstandard 要么 deprecated)
}
映射到 nodejs _伪代码_,使用 cheerio as $
const childNodes = $("h3 > a");
const childNodesName = childNodes.text();
const childNodesLink = childNodes.attr("href"); // #hash 后续需要拼接当前url+#xx
const grandChildNodes = $("h3 + div li > a");
const grandChildNodesName = grandChildNodes.children("code").text();
const grandChildNodesLink = grandChildNodes.attr("href"); // 后续需要拼接'https://developer.mozilla.org' + xx
bread-crumb-length === 8
例 https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String
url 判断依据: (url.match(/(?<=^https\:\/\/).*/)??[''])[0]?.split('/').length === 8
h2 > a {
// 大分类 hash#url, 例 #static_methods
}
h2 + div dt > a {
// 大分类 的 子页面
// TODO: 排除 a 后面带 svg 的(要么 nonstandard 要么 deprecated)
}
映射到 nodejs _伪代码_,使用 cheerio as $
const childNodes = $("h2 > a");
const childNodesName = childNodes.text();
const childNodesLink = childNodes.attr("href"); // 同上
const grandChildNodes = $("h2 + div dt > a");
const grandChildNodesName = grandChildNodes.children("code").text();
const grandChildNodesLink = grandChildNodes.attr("href"); // 同上
草稿代码
需要注意的点:
- 可能会通过不同路径拿到同一个 url,需要去重;
- css 选择器 不准确,可能会多拿 非预期的元素,进到 page 大概率会 error
- 上述分类标准 非兼容 所有 url,需要排除(如果 root 不从 global_objects 开始的话,
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators 适用于h3 + div dt > a
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements 适用于h3 + div dt > a
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions … 完全就是类似 string level
综上。root 只从 global_objects 开始。后续考虑分别对 同级不同 root 开始深度爬
nodejs 拿到 json
const https = require("https");
const fs = require("fs");
const cheerio = require("cheerio");
const rootUrl =
"https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects";
https
.get(rootUrl, (res) => {
let html = "";
res.on("data", (data) => {
html += data;
});
res.on("end", () => {
fs.writeFile("creawer.txt", html, (err) => {
console.log(err);
});
// console.log(html);
filterHtml(html);
});
})
.on("error", () => {
console.log("crash!");
});
// todo filterHtml into .json
// === 7 / 8 ?
// 递归
前端 配合 echarts
https://github.com/Hojondo/MDN_MIND_MAPPING