# web2pdf
A CLI tool to extract a part of a website, create a PDF
(new version now support both class and id , even some styling )
(new version now support --comic-mode read document for more info )
------
> let me tell u a really fun story
> i am a manga and lightnovel fan
>
> lightnovels r novels that never become a book
> r really better than many books
> they rarely become even pdf
> some of them even dont get translated to english and r still korean
>
> so i wanted to read them as pdf and doing it manually is really hard and boring
>
> so lets go to point
> i wrote my own tools to do so -------- web2pdf
it is totally cool u just give it
1. **web page**
2. **the part that contain novel or anything ( id or class )**
and it do the job
it make all of it to a ***perfect pdf***
it is called **web2pdf**
# how to install
if u use arch linux btw
`yay -S web2pdf`
available in aur
https://aur.archlinux.org/packages/web2pdf
if u wanna compile it yourself
1. clone repository
2. go to cloned file
3. go to venv using
`cd ./bin/ & source activate`
4. install dependancies
`pip install requests beautifulsoup4 reportlab`
5. run and enjoy using python
`python web2pdf.py`
6. u can even make build its binary yourself it is easy
# how to use ?
``` usage: web2pdf.py [-h] [--id ID] [--class CLASS_NAME]
usage: web2pdf.py [-h] [--id ID] [--class CLASS_NAME]
[--exclude EXCLUDE [EXCLUDE ...]] [--comic-mode]
url pdf_name
Save webpage content as PDF or images
positional arguments:
url URL of the webpage to scrape
pdf_name Name of the PDF file to save
options:
-h, --help show this help message and exit
--id ID ID of the content to extract
--class CLASS_NAME Class name of the content to extract
--exclude EXCLUDE [EXCLUDE ...]
Class names of elements to exclude
--comic-mode Save images and pdf them (like a real comic or manga)```
```
- `--comic-mode` : sometimes u wanna download a manga or comic from INTERNET
they have a part that comic is saved using very long images that r put tougher
downloading them one by one and make a pdf out of it is hard and somehow impossible
you can use web2pdf using `--comic-mode` these times
1. it will make a dir with the same name of pdf and save all of page images
2. and than make a pdf out of it
- ` --id ID ID of the content to extract
--class CLASS_NAME Class name of the content to extract
--exclude EXCLUDE [EXCLUDE ...]`
these args r optional by default it will make a pdf out of all website
# what to do next
- [ ] maybe adding translation ability to cli tools
# end ?!
in the end i will be happy if u share your ideas about this script with me
TY so much ❤️