That is great. I'm glad they were big enough to do that.

For what it is worth, I think you're still wrong.

I just looked again, and I am a fool. All I could see was monthly subscription pricing, to begin with. Even one month would be too much. But there's also a batch price that should do what I need for a much more reasonable amount.

// @japchap

No, I don't really need the content, as I have already scraped that.

I need metadata, if you will, about each page.

I found a thing called Scrapy [doc.scrapy.org] that seems like it ought to be able to do what I want, but it is way beyond my abilities to implement, and not really worth it for this one job. I doubt that there will be other jobs like it.

Maybe I should just pony up and pay the online service.

// @japchap

@japchap Thanks. SiteSucker seems to do a similar job to httrack. So that end is covered. What I really want is something that will crawl the mirror and create its structure.

I'll keep looking

Yup. Online, there's one I tried called Content Insight [content-insight.com] that provides URL, Type, Size, Level, Title, Word Count and Links In and Out.

I'm not even sure what I should be searching for, search terms, I mean.

I'm working on migrating some of the content of a website into a new incarnation of the website. I'd like, if possible, to construct a spreadsheet that mirrors the structure and contents of the old website. There are online tools to do this, but they are expensive. At the moment I am downloading a mirror of the entire site using httrack. When that's done, can anyone recommend an app that will step through it and extract the information I need?

What's my number?

slice.png

In case anyone is interested, I finally wrote up the recipe for my carrot cake [fornacalia.com].

@kdfrawg Oh yeah. Fully confirmed now as I snuffle myself to sleep, pausing only to down a hot Lemsip.