Python 3.9+ is necessary to run this
library. Poetry
is used for packaging and dependency management.
The API comprises a main class FAAPI, two submission classes Submission and SubmissionPartial, a journal
class Journal, and a user class User.
Once FAAPI is initialized, its methods can be used to crawl FA and return parsed objects.
from requests.cookies import RequestsCookieJar
import faapi
import orjson
cookies = RequestsCookieJar()
cookies.set("a", "38565475-3421-3f21-7f63-3d341339737")
cookies.set("b", "356f5962-5a60-0922-1c11-65003b70308")
api = faapi.FAAPI(cookies)
sub, sub_file = api.submission(12345678, get_file=True)
print(sub.id, sub.title, sub.author, f"{len(sub_file) / 1024:02f}KiB")
with open(f"{sub.id}.json", "wb") as f:
f.write(orjson.dumps(dict(sub)))
with open(sub.file_url.split("/")[-1], "wb") as f:
f.write(sub_file)
gallery, _ = api.gallery("user_name", 1)
with open("user_name-gallery.json", "wb") as f:
f.write(orjson.dumps(list(map(dict, gallery))))At init, the FAAPI object downloads the robots.txt file from FA to determine
the Crawl-delay and disallow values set therein. If not set in the robots.txt file, a crawl delay value of 1 second
is used.
To respect this value, the default behaviour of the FAAPI object is to wait when a get request is made if the last
request was performed more recently then the crawl delay value.
See under FAAPI for more details on this behaviour.
Furthermore, any get operation that points to a disallowed path from robots.txt will raise an exception. This check should not be circumvented, and the developer of this library does not take responsibility for violations of the TOS of Fur Affinity.
To access protected pages, cookies from an active session are needed. These cookies can be given to the FAAPI object as
a list of dictionaries - each containing a name and a value field -, or as a http.cookiejar.CookieJar
object (requests.cookies.RequestsCookieJar and other objects inheriting from CookieJar are also supported). The
cookies list should look like the following example:
cookies = [
{"name": "a", "value": "38565475-3421-3f21-7f63-3d3413397537"},
{"name": "b", "value": "356f5962-5a60-0922-1c11-65003b703038"},
]from requests.cookies import RequestsCookieJar
cookies = RequestsCookieJar()
cookies.set("a", "38565475-3421-3f21-7f63-3d3413397537")
cookies.set("b", "356f5962-5a60-0922-1c11-65003b703038")To access session cookies, consult the manual of the browser used to log in.
Note: it is important to not logout of the session the cookies belong to, otherwise they will no longer work.
Note: as of April 2022 only cookies a and b are needed.
FAAPI attaches a User-Agent header to every request. The user agent string is generated at startup in the following
format: faapi/{library version} Python/{python version} {system name}/{system release}.
This is the main object that handles all the calls to scrape pages and get submissions.
It holds 6 different fields:
session: requests.SessionThe session used for all requests.robots: urllib.robotparser.RobotFileParserrobots.txt handleruser_agent: struser agent used by the session (property, cannot be set)crawl_delay: floatcrawl delay from robots.txt (property, cannot be set)last_get: floattime of last get (UNIX time)raise_for_unauthorized: bool = Trueif set toTrue, raises an exception if a request is made and the resulting page is not from a login sessiontimeout: int | None = Nonerequests timeout in seconds for both page requests (e.g. submissions) and files
__init__(cookies: list[dict[str, str]] | CookieJar, session_class: Type[Session] = Session)
A FAAPI object must be initialised with a cookies object in the format mentioned above in #Cookies.
An optional session_class argument can be given to modify the class used by FAAPI.session. Any class based
on requests.Session is accepted.
load_cookies(cookies: list[dict[str, str]] | CookieJar)
Load new cookies and create a new session.
Note: This method removes any cookies currently in use, to update/add single cookies access them from the session object.handle_delay()
Handles the crawl delay as set in the robots.txtcheck_path(path: str, *, raise_for_disallowed: bool = False) -> bool
Checks whether a given path is allowed by the robots.txt. Ifraise_for_disallowedis set toTrueaDisallowedPathexception is raised on non-allowed paths.connection_status -> bool
Returns the status of the connection.login_status -> bool
Returns the login status.get(path: str, **params) -> requests.Response
This returns a response object containing the result of the get operation on the given URL with the optional**paramsadded to it (url provided is considered as path from 'https://www.furaffinity.net/').get_parsed(path: str, *, skip_page_check: bool = False, skip_auth_check: bool = False, **params) -> bs4.BeautifulSoup
Similar toget()but returns the parsed HTML from the normal get operation. If the GET request encountered an error, anHTTPErrorexception is raised. Ifskip_page_checkis set toTrue, the parsed page is not checked for errors ( e.g. non-existing submission). Ifskip_auth_checkis set toTrue, the page is not checked for login status.me() -> User | None
Returns the logged-in user as aUserobject if the cookies are from a login session.frontpage() -> list[SubmissionPartial]
Fetch the latest submissions from Fur Affinity's front page.submission(submission_id: int, get_file: bool = False, *, chunk_size: int = None) -> tuple[Submission, bytes | None]
Given a submission ID, it returns aSubmissionobject containing the various metadata of the submission itself and abytesobject with the submission file ifget_fileis passed asTrue. The optionalchunk_sizeargument is used for the request; if left toNoneor set to 0 the download is performed directly without streaming.
Note: the authorUserPartialobject of the submission does not contain thejoin_datefield as it does not appear on submission pages.submission_file(submission: Submission, *, chunk_size: int = None) -> bytes
Given a submission object, it downloads its file and returns it as abytesobject. The optionalchunk_sizeargument is used for the request; if left toNoneor set to 0 the download is performed directly without streaming.journal(journal_id: int) -> Journal
Given a journal ID, it returns aJournalobject containing the various metadata of the journal.user(user: str) -> User
Given a username, it returns aUserobject containing information regarding the user.gallery(user: str, page: int = 1) -> tuple[list[SubmissionPartial], int | None]
Returns the list of submissions found on a specific gallery page, and the number of the next page. The returned page number is set toNoneif it is the last page.scraps(user: str, page: int = 1) -> -> tuple[list[SubmissionPartial], int | None]
Returns the list of submissions found on a specific scraps page, and the number of the next page. The returned page number is set toNoneif it is the last page.favorites(user: str, page: str = "") -> tuple[list[SubmissionPartial], str | None]
Downloads a user's favorites page. Because of how favorites pages work on FA, thepageargument (and the one returned) are strings. If the favorites page is the last then aNoneis returned as next page. An empty page value as argument is equivalent to page 1.
Note: favorites page "numbers" do not follow any scheme and are only generated server-side.journals(user: str, page: int = 1) -> -> tuple[list[JournalPartial], int | None]
Returns the list of submissions found on a specific journals page, and the number of the next page. The returned page number is set toNoneif it is the last page.watchlist_to(self, user: str, page:int = 1) -> tuple[list[UserPartial], int | None]
Given a username, returns a list ofUserPartialobjects for each user that is watching the given user and the next page, if it is not the last, in which case aNoneis returned.watchlist_by(self, user: str, page:int = 1) -> tuple[list[UserPartial], int | None]
Given a username, returns a list ofUserPartialobjects for each user that is watched by the given user and the next page, if it is not the last, in which case aNoneis returned.
Note: The last page returned by the watchlist_to and watchlist_by may not be correct as Fur Affinity doesn't seem
to have a consistent behaviour when rendering the next page button, as such it is safer to use an external algorithm to
check whether the method is advancing the page but returning the same/no users.
A stripped-down class that holds basic user information. It is used to hold metadata gathered when parsing a submission, journal, gallery, scraps, etc.
name: strdisplay name with capital letters and extra characters such as "_"status: struser status (~, !, etc.)title: strthe user title as it appears on their userpagejoin_date: datetimethe date the user joined (defaults to timestamp 0)avatar_url: strthe URL to the user icon (used only when available)user_tag: bs4.element.Tagthe user element used to parse information (placeholder,UserPartialis filled externally)
UserPartial objects can be directly cast to a dict object and iterated through.
Comparison with UserPartial can be made with either another UserPartial or User object (the URL names are
compared), or a string (the URL name is compared to the given string).
__init__(user_tag: bs4.element.Tag = None)
To initialise the object, an optional bs4.element.Tag object is needed containing the user element from a user page or
user folder.
If no user_tag is passed then the object fields will remain at their default - empty - value.
name_url -> str
Property method that returns the URL-safe usernameurl -> str
Property method that returns the Fur Affinity URL to the user (https://www.furaffinity.net/user/{name_url}).generate_avatar_url() -> str
Generates the URl for the current user icon.parse(user_page: bs4.BeautifulSoup = None)
Parses the stored user page for metadata. Ifuser_pageis passed, it overwrites the existinguser_pagevalue.
The main class storing all of a user's metadata.
name: strdisplay name with capital letters and extra characters such as "_"status: struser status (~, !, etc.)title: strthe user title as it appears on their userpagejoin_date: datetimethe date the user joined (defaults to timestamp 0)profile: strprofile text in HTML formatprofile_bbcode: strprofile text in BBCode formatstats: UserStatsuser statistics sorted in anamedtuple(views,submissions,favorites,comments_earned,comments_made,journals,watched_by,watching)info: dict[str, str]profile information (e.g. "Accepting Trades", "Accepting Commissions", "Character Species", etc.)contacts: dict[str, str]contact links (e.g. Twitter, Steam, etc.)avatar_url: strthe URL to the user iconbanner_url: str | Nonethe URL to the user banner (if any is set, otherwiseNone)watched: boolTrueif the user is watched,Falseotherwisewatched_toggle_link: str | NoneThe link to toggle the watch status (/watch/or/unwatch/type link)blocked: boolTrueif the user is blocked,Falseotherwiseblocked_toggle_link: str | NoneThe link to toggle the block status (/block/or/unblock/type link)user_page: bs4.BeautifulSoupthe user page used to parse the object fields
User objects can be directly cast to a dict object and iterated through.
Comparison with User can be made with either another User or UserPartial object (the URL names are compared), or a
string (the URL name is compared to the given string).
__init__(user_page: bs4.BeautifulSoup = None)
To initialise the object, an optional bs4.BeautifulSoup object is needed containing the parsed HTML of a submission
page.
If no user_page is passed then the object fields will remain at their default - empty - value.
name_url -> str
Property method that returns the URL-safe usernameurl -> str
Property method that returns the Fur Affinity URL to the user (https://www.furaffinity.net/user/{name_url}).generate_avatar_url() -> str
Generates the URl for the current user icon.parse(user_page: bs4.BeautifulSoup = None)
Parses the stored user page for metadata. Ifuser_pageis passed, it overwrites the existinguser_pagevalue.
This object contains partial information gathered when parsing a journals folder. It contains the following fields:
id: intjournal IDtitle: strjournal titlerating: strjournal ratingdate: datetimeupload date as adatetimeobject (defaults to timestamp 0)author: UserPartialjournal author (filled only if the journal is parsed from abs4.BeautifulSouppage)stats: JournalStatsjournal statistics stored in a named tuple (comments(count))content: strjournal content in HTML formatcontent_bbcode: strjournal content in BBCode formatmentions: list[str]the users mentioned in the content (if they were mentioned as links, e.g.:iconusername:,@username, etc.)journal_tag: bs4.element.Tagthe journal tag used to parse the object fields
JournalPartial objects can be directly cast to a dict object or iterated through.
Comparison with JournalPartial can be made with either another JournalPartial or Journal object (the IDs are
compared), or an integer (the JournalPartial.id value is compared to the given integer).
__init__(journal_tag: bs4.element.Tag = None)
Journal takes one optional parameters: a journal section tag from a journals page.
If no journal_tag is passed then the object fields will remain at their default - empty - value.
url -> str
Property method that returns the Fur Affinity URL to the journal (https://www.furaffinity.net/journal/{id}).parse(journal_item: bs4.element.Tag = None)
Parses the stored journal tag for information. Ifjournal_tagis passed, it overwrites the existingjournal_tagvalue.
This object contains full information gathered when parsing a journal page. It contains the same fields
as JournalPartial with the addition of comments:
id: intjournal IDtitle: strjournal titlerating: strjournal ratingdate: datetimeupload date as adatetimeobject (defaults to timestamp 0)author: UserPartialjournal author (filled only if the journal is parsed from abs4.BeautifulSouppage)stats: JournalStatsjournal statistics stored in a named tuple (comments(count))content: strjournal content in HTML formatcontent_bbcode: strjournal content in BBCode formatheader: strjournal header in HTML format (if present)footer: strjournal footer in HTML format (if present)mentions: list[str]the users mentioned in the content (if they were mentioned as links, e.g.:iconusername:,@username, etc.)comments: list[Comments]the comments to the journal, organised in a tree structurejournal_page: bs4.BeautifulSoupthe journal page used to parse the object fields
Journal objects can be directly cast to a dict object or iterated through.
Comparison with Journal can be made with either another Journal or JournalPartial object (the IDs are compared),
or an integer (the Journal.id value is compared to the given integer).
__init__(journal_page: bs4.BeautifulSoup = None)
Journal takes one optional journal page argument.
If no journal_page is passed then the object fields will remain at their default - empty - value.
url -> str
Property method that returns the Fur Affinity URL to the journal (https://www.furaffinity.net/journal/{id}).parse(journal_page: bs4.BeautifulSoup = None)
Parses the stored journal tag for information. Ifjournal_tagis passed, it overwrites the existingjournal_tagvalue.
This lightweight submission object is used to contain the information gathered when parsing gallery, scraps, and favorites pages. It contains only the following fields:
id: intsubmission IDtitle: strsubmission titleauthor: UserPartialsubmission author (only thenamefield is filled)rating: strsubmission rating [general, mature, adult]type: strsubmission type [text, image, etc...]thumbnail_url: strthe URL to the submission thumbnailsubmission_figure: bs4.element.Tagthe figure tag used to parse the object fields
SubmissionPartial objects can be directly cast to a dict object or iterated through.
Comparison with Submission can be made with either another SubmissionPartial or Submission object (the IDs are
compared), or an integer (the Submission.id value is compared to the given integer).
__init__(submission_figure: bs4.element.Tag = None)
To initialise the object, an optional bs4.element.Tag object is needed containing the parsed HTML of a submission
figure tag.
If no submission_figure is passed then the object fields will remain at their default - empty - value.
url -> str
Property method that returns the Fur Affinity URL to the submission (https://www.furaffinity.net/view/{id}).parse(submission_figure: bs4.element.Tag = None)
Parses the stored submission figure tag for information. Ifsubmission_figureis passed, it overwrites the existingsubmission_figurevalue.
The main class that parses and holds submission metadata.
id: intsubmission IDtitle: strsubmission titleauthor: UserPartialsubmission author (only thename,title, andavatar_urlfields are filled)date: datetimeupload date as adatetimeobject (defaults to timestamp 0)tags: list[str]tags listcategory: strcategoryspecies: strspeciesgender: strgenderrating: strratingstats: SubmissionStatssubmission statistics stored in a named tuple (views,comments(count),favorites)type: strsubmission type (text, image, etc...)description: strdescription in HTML formatdescription_bbcode: strdescription in BBCode formatfooter: strfooter in HTML formatmentions: list[str]the users mentioned in the description (if they were mentioned as links, e.g.:iconusername:,@username, etc.)folder: strthe submission folder (gallery or scraps)user_folders: list[SubmissionUserFolder]user folders stored in a list of named tuples (name,url,group( if any))file_url: strthe URL to the submission filethumbnail_url: strthe URL to the submission thumbnailprev: intthe ID of the previous submission (if any)next: intthe ID of the next submission (if any)favorite: boolTrueif the submission is a favorite,Falseotherwisefavorite_toggle_link: strthe link to toggle the favorite status (/fav/or/unfav/type URL)comments: list[Comments]the comments to the submission, organised in a tree structuresubmission_page: bs4.BeautifulSoupthe submission page used to parse the object fields
Submission objects can be directly cast to a dict object and iterated through.
Comparison with Submission can be made with either another Submission or SubmissionPartial object (the IDs are
compared), or an integer (the Submission.id value is compared to the given integer).
__init__(submission_page: bs4.BeautifulSoup = None)
To initialise the object, an optional bs4.BeautifulSoup object is needed containing the parsed HTML of a submission
page.
If no submission_page is passed then the object fields will remain at their default - empty - value.
url -> str
Property method that returns the Fur Affinity URL to the submission (https://www.furaffinity.net/view/{id}).parse(submission_page: bs4.BeautifulSoup = None)
Parses the stored submission page for metadata. Ifsubmission_pageis passed, it overwrites the existingsubmission_pagevalue.
This object class contains comment metadata and is used to build a tree structure with the comments and their replies.
id: intthe comment IDauthor: UserPartialthe user who posted the commentdate: datetimethe date the comment was postedtext: strthe comment text in HTML formattext_bbcode: strthe comment text in BBCode formatreplies: list[Comment]list of replies to the commentreply_to: Comment | int | Nonethe parent comment, if the comment is a reply. The variable type isintonly if the comment is parsed outside the parse method of aSubmissionorJournal(e.g. by creating a new comment with a comment tag), and when iterating over the parent object (to avoid infinite recursion errors), be itSubmission,Journalor anotherComment.edited: boolTrueif the comment was edited,Falseotherwisehidden: boolTrueif the comment was hidden,Falseotherwise (if the comment was hidden, the author and date fields will default to their empty values)parent: Submission | Journal | NonetheSubmissionorJournalobject the comments are connected tocomment_tag: bs4.element.Tagthe comment tag used to parse the object fields
Comment objects can be directly cast to a dict object and iterated through.
Comparison with Comment can be made with either another comment (the IDs are compared), or an integer (
the Comment.id value is compared to the given integer).
Note: The __iter__ method of Comment objects automatically removes recursion. The parent variable is set
to None and reply_to is set to the comment's ID.
Note: Because each comment contains the parent Submission or Journal object (which contains the comment itself)
and the replied comment object, some iterations may cause infinite recursion errors, for example when using
the copy.deepcopy function. If such iterations are needed, simply set the parent variable to None and
the reply_to variable to None or the comment's ID (this can be done easily after flattening the comments list
with faapi.comment.flatten_comments, the comments can then be sorted again with faapi.comment.sort_comments which
will also restore the reply_to values to Comment objects).
__init__(self, tag: bs4.element.Tag = None, parent: Submission | Journal = None)
To initialise the object, an optional bs4.element.Tag object is needed containing the comment tag as taken from a
submission/journal page.
The optional parent argument sets the parent variable described above.
If no tag is passed then the object fields will remain at their default - empty - value.
url -> str
Property method that returns the Fur Affinity URL to the comment ( e.g.https://www.furaffinity.net/view/12345678#cid:1234567890). If theparentvariable isNone, the property returns an empty string.parse(tag: bs4.element.Tag = None)
Parses the stored tag for metadata. Iftagis passed, it overwrites the existingtagvalue.
These extra functions can be used to operate on a list of comments. They only alter the order and structure, but they do not touch any of the metadata.
faapi.comment.sort_comments(comments: list[Comment]) -> list[Comment]
Sorts a list of comments into a tree structure. Replies are overwritten.faapi.comment.flatten_comments(comments: list[Comment]) -> list[Comment]
Flattens a list of comments. Replies are not modified.
Using the tree structure generated by the library, it is trivial to build a graph visualisation of the comment tree using the DOT language.
submission, _ = api.submission(12345678)
comments = faapi.comment.flatten_comments(submission.comments)
with open("comments.dot", "w") as f:
f.write("digraph {\n")
for comment in [c for c in comments if c.reply_to is None]:
f.write(f" parent -> {comment.id}\n")
for comment in comments:
for reply in comment.replies:
f.write(f" {comment.id} -> {reply.id}\n")
f.write("}")digraph {
parent -> 157990848
parent -> 157993838
parent -> 157997294
157990848 -> 158014077
158014077 -> 158014816
158014816 -> 158093180
158093180 -> 158097024
157993838 -> 157998464
157993838 -> 158014126
157997294 -> 158014135
158014135 -> 158014470
158014135 -> 158030074
158014470 -> 158093185
158030074 -> 158093199
}The graph above was generated with quickchart.io
Using the BBCode fields allows to convert between the raw HTMl recovered from Fur Affinity and BBCode tags that follow FA's guidelines. Conversion from HTML to BBCode covers all known tags and preserves all newlines and spacing.
BBCode text can be converted to Fur Affinity's HTMl using the faapi.parse.bbcode_to_html() function. The majority of
submissions can be converted back and forth between HTML and BBCode without any information loss, however, the parser
rules are still a work in progress and there are many edge cases where unusual text and formatting cause the parser to
generate incorrect HTML.
The following are the exceptions explicitly raised by the FAAPI functions. The exceptions deriving from ParsingError
are chosen depending on the content of the page. Because Fur Affinity doesn't use HTTP status codes besides 404, the
page is checked against a static list of known error messages/page titles in order to determine the specific error to be
used. If no match is found, then the ServerError (if the page has the "Server Error" title) or the more
general NoticeMessage exceptions are used instead. The actual error message parsed from the page is used as argument
for the exceptions, so that it can be analysed when caught.
DisallowedPath(Exception)The path is not allowed by the robots.txt.Unauthorized(Exception)The user is not logged-in.ParsingError(Exception)An error occurred while parsing the page.NonePage(ParsingError)The parsed page isNone.NotFound(ParsingError)The resource could not be found (general 404 page or non-existing submission, user, or journal).NoTitle(ParsingError)The parsed paged is missing a title.DisabledAccount(ParsingError)The resource belongs to a disabled account.ServerError(ParsingError)The page contains a server error notice.NoticeMessage(ParsingError)A notice of unknown type was found in the page.
When parsing some pages or converting HTML to BBCode, the Beautiful Soup
library may give some warnings, for example MarkupResemblesLocatorWarning. These warnings are left enabled for
clarity, but can be disabled manually using the warnings.filterwarnings function.
All contributions and suggestions are welcome!
If you have suggestions for fixes or improvements, you can open an issue with your idea, see #Issues for details.
If any problem is encountered during usage of the program, an issue can be opened on GitHub.
Issues can also be used to suggest improvements and features.
When opening an issue for a problem, please copy the error message and describe the operation in progress when the error occurred.
