Avoid wget appending index.html to links -

- March 15, 2015

i trying make static html copy of wordpress site can upload somewhere else, github pages.

i use command:

option 1:

wget -k -r -l 1000 -p -n -f -nh -p ./website http://example.com/website

it downloads entire site etc. main issue here adds "index.html" every single link. understand need view site locally, not required on static website host.

so there way tell wget not modify links , add index.html them?

for example creates:

<a href="blog/2015/07/11/hello-world/index.html">hello world!</a>

on default worpress hello world post.

option 2:

use mirroring command -k convert links:

wget -e -m -p -f -nh -p ./website http://example.com/website

then not apply index.html , retain domain name.

but crawls http://example.com , indexes there. not want that. want /website root (because wordpress multi site). how fix this?

i want rewrite hostname instead of stripping or keeping it. should go http://example.com/website/ (wordpress multi site) http://example.org/ possible or need run sed/awk on files after download?

faced similar problem, solved postprocessing sed.

this replaces occurrences of /index.html' /' comment above indicates redirect occurrs anyway if trailing slash missing, added =)

find ./ -type f -exec sed -i -e "s/\/index\.html'/\/\'/g" {} \;

and monster replaces occurrences of "index.html" or 'index.html' (or "index.html' or 'index.html" ..) ".":

find ./ -type f -exec sed -i -e "s/['\\\"]index\.html['\\\"]/\\\".\\\"/g" {} \;

you can sed doing matches e.g. on index.html command:

sed -n "s/['\\\"]index\.html['\\\"]/'\/'/p" index.html

hope find useful

Search This Blog

Overvie

Avoid wget appending index.html to links -

Comments

Post a Comment

Popular posts from this blog

android - Gradle sync Error:Configuration with name 'default' not found -

StringGrid issue in Delphi XE8 firemonkey mobile app -

html - jQuery UI Sortable - Remove placeholder after item is dropped -