@literary That is great. I'm glad they were big enough to do that.
@matigo I just looked again, and I am a fool. All I could see was monthly subscription pricing, to begin with. Even one month would be too much. But there's also a batch price that should do what I need for a much more reasonable amount.
// @japchap
@matigo No, I don't really need the content, as I have already scraped that.
I need metadata, if you will, about each page.
I found a thing called Scrapy [doc.scrapy.org] that seems like it ought to be able to do what I want, but it is way beyond my abilities to implement, and not really worth it for this one job. I doubt that there will be other jobs like it.
Maybe I should just pony up and pay the online service.
// @japchap
@japchap Thanks. SiteSucker seems to do a similar job to httrack. So that end is covered. What I really want is something that will crawl the mirror and create its structure.
I'll keep looking
@matigo Yup. Online, there's one I tried called Content Insight [content-insight.com] that provides URL, Type, Size, Level, Title, Word Count and Links In and Out.
I'm not even sure what I should be searching for, search terms, I mean.
I'm working on migrating some of the content of a website into a new incarnation of the website. I'd like, if possible, to construct a spreadsheet that mirrors the structure and contents of the old website. There are online tools to do this, but they are expensive. At the moment I am downloading a mirror of the entire site using httrack. When that's done, can anyone recommend an app that will step through it and extract the information I need?
In case anyone is interested, I finally wrote up the recipe for my carrot cake [fornacalia.com].
@kdfrawg Oh yeah. Fully confirmed now as I snuffle myself to sleep, pausing only to down a hot Lemsip.