I want to rip the contents of a pay website, but I have to log in to their web site on a web page to get access

Does anyone have any good tools for Windows for that?

I’m guessing that any such tools must have a built in browser, or be a browser plugin for it to work.

@[email protected]
link
fedilink
English
32Y

yt-dlp can download videos using cookies from your web browser. I haven’t gotten it to work myself but there’s probably a tutorial somewhere.

∟⊔⊤∦∣≶
link
fedilink
English
32Y

What’s the site, because often you can find specially designed tools on github for this purpose that handle all the logins etc.

DaGeek247
link
fedilink
52Y

Speaking of likely-to-work browser plugins; https://addons.mozilla.org/en-US/firefox/addon/downthemall/

I’ve used Down The Mall heaps in the past, it works well

@[email protected]
link
fedilink
English
102Y

Unless you have an account there’s no easy way to get access to the content on the page. Once you have an account there’s technically nothing stopping you from just saving the HTML file to your computer.

Something else you can try though, assuming you don’t have an account, is to just turn off JavaScript. If the site lets you partially load the content and then asks you to create an account to read more, they usually just block the content by having JavaScript add an opaque overlay. With JavaScript disabled, obviously it’s not there to add the overlay and you’re able to keep reading.

m-p{3}
link
fedilink
English
2
edit-2
2Y

Depending on the website, there might be some tools specifically tailored for that website you could use that will extract the content you’re looking for, but they’re likely going to be command-line based, and you’ll likely have to extract your cookies so that the tools can work as if you were logged in your account from outside your browser.

Is it too much to ask which website?

@[email protected]
creator
link
fedilink
English
32Y

I have an account, so that’s not a problem. The problem is how to automate going into every little content page and downloading the content, including the hi-res files.

tekchic
link
fedilink
22Y

I’m on a Mac and use SiteSucker so I know that’s not super helpful but for windows you could try wGet or WebCopy? https://www.cyotek.com/cyotek-webcopy / https://gnuwin32.sourceforge.net/packages/wget.htm

@[email protected]
creator
link
fedilink
12Y

Webcopy looks promising if I can get the crawler part of it to work with this site’s authentication…

@[email protected]
link
fedilink
English
12Y

It also might block the loading of the page content…

I would assume its being fetched by a javascript script, through an api.

That is fairly common

@[email protected]
link
fedilink
English
22Y

If you’re open to docker options, I’ve used and recommend ArchiveBox. It supports using a login to rip sites, and you can set it to rip once or on a schedule, etc.

I think they have a desktop app version in the works if you were looking for a more of a one-time approach.

fiat_lux
link
fedilink
42Y

Httrack might do what you need

@[email protected]
creator
link
fedilink
22Y

Httrack doesn’t allow me to log into the website. The only security feature it has is http authorization, and this particular website has a plain web login.

fiat_lux
link
fedilink
22Y

Depending on how they auth, this might give you a way to look like httrack is your existing logged in session: https://superuser.com/questions/157331/mirroring-a-web-site-behind-a-login-form

@[email protected]
creator
link
fedilink
12Y

Interesting idea. Unfortunately the cookies weren’t in cleartext in the page headers. I found the cookies values in the networking values, pasted them into htttrack, but that didn’t work.

My html cookie-fu is weak.

Came here to say this. Idk how it does with a password protected site.

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ
[email protected]
Create a post
⚓ A community devoted to in-depth debate on topics concerning digital piracy, ethical problems, and legal advancements.

𝗣𝗜𝗥𝗔𝗖𝗬 𝗜𝗦 𝗘𝗧𝗛𝗜𝗖𝗔𝗟!


Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don’t request invites, trade, sell, or self-promote

3. Don’t request or link to specific pirated titles

4. Don’t be repetitious, spam, harass others, or submit low-quality posts

5. Don’t post questions already answered. READ THE WIKI


Image


Loot, Pillage, & Plunder


💰 Please help cover server costs.


  • 1 user online
  • 193 users / day
  • 35 users / week
  • 201 users / month
  • 803 users / 6 months
  • 0 subscribers
  • 534 Posts
  • 9.83K Comments
  • Modlog