Eat This Blog

@matigo No, I don't really need the content, as I have already scraped that.

I need metadata, if you will, about each page.

I found a thing called Scrapy [doc.scrapy.org] that seems like it ought to be able to do what I want, but it is way beyond my abilities to implement, and not really worth it for this one job. I doubt that there will be other jobs like it.

Maybe I should just pony up and pay the online service.

// @japchap

2016-08-02T10:33:39Z

@japchap Thanks. SiteSucker seems to do a similar job to httrack. So that end is covered. What I really want is something that will crawl the mirror and create its structure.

I'll keep looking

2016-08-02T10:14:12Z

@matigo Yup. Online, there's one I tried called Content Insight [content-insight.com] that provides URL, Type, Size, Level, Title, Word Count and Links In and Out.

I'm not even sure what I should be searching for, search terms, I mean.

2016-08-02T10:11:19Z

I'm working on migrating some of the content of a website into a new incarnation of the website. I'd like, if possible, to construct a spreadsheet that mirrors the structure and contents of the old website. There are online tools to do this, but they are expensive. At the moment I am downloading a mirror of the entire site using httrack. When that's done, can anyone recommend an app that will step through it and extract the information I need?

2016-08-02T09:58:28Z

@matigo What's my number?

2016-08-02T09:16:12Z

In case anyone is interested, I finally wrote up the recipe for my carrot cake [fornacalia.com].

2016-08-02T05:41:12Z

@kdfrawg Oh yeah. Fully confirmed now as I snuffle myself to sleep, pausing only to down a hot Lemsip.

2016-08-01T20:36:06Z

@literary Oh crap.

2016-08-01T16:48:09Z

@matigo Oh. I missed the link to what you had in mind.

It appeals to me, not least because I am very bad at just listening to people have their say without trying to "solve" whatever it is they're talking about.

2016-08-01T12:47:04Z

@matigo Could do. What did you have in mind?

One thing I might try to do today is record a DDP explaining that I too am not doing DDP this August.

2016-08-01T12:45:52Z