I have a problem with logging in in my script. Despite all other good answers that I found on stackoverflow, none of the solutions worked for me.
I am scraping a web forum for my PhD research, its URL is http://forum.axishistory.com.
The webpage I want to scrape is the memberlist - a page that lists the links to all member profiles. One can only access the memberlist if logged in. If you try to access the memberlist without logging in, it shows you the log in form.
The URL of the memberlist is this: http://forum.axishistory.com/memberlist.php.
I tried the httr-package:
members <- GET("http://forum.axishistory.com/memberlist.php", authenticate("username", "password"))
members_html <- html(members)
members_html <- htmlParse(getURL("http://forum.axishistory.com/memberlist.php", userpwd = "username:password"))
handle <- handle("http://forum.axishistory.com/")
path <- "ucp.php?mode=login"
login <- list(
amember_login = "username"
,amember_pass = "password"
response <- POST(handle = handle, path = path, body = login)
Thanks to Simon I found the answer here: Using RVest or httr to log in to non-standard forms on a webpage
library(rvest) url <-"http://forum.axishistory.com/memberlist.php" pgsession <-html_session(url) pgform <-html_form(pgsession)[] filled_form <- set_values(pgform, "username" = "username", "password" = "password") submit_form(pgsession,filled_form) memberlist <- jump_to(pgsession, "http://forum.axishistory.com/memberlist.php") page <- html(memberlist) usernames <- html_nodes(x = page, css = "#memberlist .username") data_usernames <- html_text(usernames, trim = TRUE)