No, I don't really need the content, as I have already scraped that.

I need metadata, if you will, about each page.

I found a thing called Scrapy [doc.scrapy.org] that seems like it ought to be able to do what I want, but it is way beyond my abilities to implement, and not really worth it for this one job. I doubt that there will be other jobs like it.

Maybe I should just pony up and pay the online service.

// @japchap

@japchap Thanks. SiteSucker seems to do a similar job to httrack. So that end is covered. What I really want is something that will crawl the mirror and create its structure.

I'll keep looking

Yup. Online, there's one I tried called Content Insight [content-insight.com] that provides URL, Type, Size, Level, Title, Word Count and Links In and Out.

I'm not even sure what I should be searching for, search terms, I mean.

I'm working on migrating some of the content of a website into a new incarnation of the website. I'd like, if possible, to construct a spreadsheet that mirrors the structure and contents of the old website. There are online tools to do this, but they are expensive. At the moment I am downloading a mirror of the entire site using httrack. When that's done, can anyone recommend an app that will step through it and extract the information I need?

What's my number?

slice.png

In case anyone is interested, I finally wrote up the recipe for my carrot cake [fornacalia.com].

@kdfrawg Oh yeah. Fully confirmed now as I snuffle myself to sleep, pausing only to down a hot Lemsip.

Oh crap.

Oh. I missed the link to what you had in mind.

It appeals to me, not least because I am very bad at just listening to people have their say without trying to "solve" whatever it is they're talking about.

Could do. What did you have in mind?

One thing I might try to do today is record a DDP explaining that I too am not doing DDP this August.