Scraping Review Link information

NanMelch · January 11, 2022, 6:47pm

Hello,
I’m working on a script to scrape the data in review links so I can share the assets that are in review links on a csv with my team.

I’ve gotten to the point where I have a list of dictionaries with all the information I want. I’m quite sure I’m just missing a glaringly obvious thing with iterating through lists and dictionaries, but I’ve been starting at the same thing for so long, I thought I would ask for some assistance.

Looking at this info, I’m hoping to dig into the children and get a few pieces for the spreadsheet:

I had this little loop going:

children_in_review_link = {}
for i in range(len(items_in_review_link)):
    #pprint.pprint(items_in_review_link[i])
    if items_in_review_link[i][0]['asset']['children'] != []:
        review_link_children.append(items_in_review_link[i][0]['asset']['children'])
        print(items_in_review_link[i][0]['asset']['children'])
        for j in range(len(items_in_review_link[i][0]['asset']['children'])):
            children_in_review_link = {
                'child_name': items_in_review_link[i][0]['asset']['children'][j]['name'],
                'child_type': items_in_review_link[i][0]['asset']['children'][j]['type'],
                'child_size': items_in_review_link[i][0]['asset']['children'][j]['filesize'],
                'child_filetype': items_in_review_link[i][0]['asset']['children'][j]['filetype'],
            }
            review_link_children.append(children_in_review_link)

pprint.pprint(review_link_children)

and it indeed does not work! I get a “list out of range” error for the line if items_in_review_link[i][0]['asset']['children'] != []:

I had put in the pprint.pprint(items_in_review_link[i][0] line as a check, and it kicks back a “list index out of range” error on the last item in the range.

Any direction / help would be greatly appreciated!

Nancy

nath · January 12, 2022, 3:28pm

Are all the assets in a list under the children key?

If so could you just run something like this?

review_link_children = []
for item in overall_review_payload:
     children_list = item['asset]['children']
     for child in children_list:
               try:
                   children_in_review_link = {'child_name': child['name'], 'child_type': child['type']}
               except KeyError as e:
                   print("Doing something with my key errors")
                if children_in_review_link:
                   review_link_children.append(children_in_review_link)

Not sure of any of the quirks you’re seeing in the payload, but this seems like the simple way of pulling those data points and placing them in a list

Maybe you could add something to check if the children key list isn’t empty too

(ignore the indents and formatting btw… )

NanMelch · January 14, 2022, 10:21pm

Just wanted to say an actual thanks for helping out.
I discovered that there was a broken review link and that’s why I kept getting range errors.
My next step will be a more sophisticated try / except for finding those!

jhodges · January 18, 2022, 11:10pm

That would definitely do it!

Feel free to follow-up on this thread if you’re still having issues, and I can maybe throw together some more code samples for you.

NanMelch · January 24, 2022, 8:46pm

Thanks!

A couple questions:
-Is there a check I can run at the start of the loop to look for / skip broken links?
-Is it possible to do the same type of scraping for presentation links? I went to the API Reference Docs (https://developer.frame.io/api/reference/) and received 404 File Not Found though I had remembered there was a "https://api.frame.io/v2/projects/" + project_id + "/presentations" call available. From there, is there a way to scrape the presentation link data to see what assets are associated with the link?

Thanks for your help!

jhodges · March 14, 2022, 11:36pm

Heya!

There sure is, check out this API endpoint. You’d be looking in the response for the presentation_items object.

NanMelch · June 6, 2022, 2:26am

Hi again,
I’m still plugging away at trying to scrape review link info. I have an ok solution, but was hoping to really scrape what’s in the review links specifically. I attempted to copy the asset scraper function, but to no avail.

My function is this:

def scrape_review_link_data(
    client: FrameioClient,
    rev_lnk_asset_id: str, #is this project_id instead for this?
    review_link_list: List[Dict]
) -> List[Dict]:
    """
    Takes an initialized client and an asset_id or project_id maybe representing a position in a directory tree.
    Recursively builds a list of review link data, maybe.  Returns a list of dicts.
    """
    review_link_assets = items_in_review_link
    review_link_list = []

    for rev_lnk_asset in review_link_assets:

            if rev_lnk_asset[0][0]["asset"]["type"] == "folder" and rev_lnk_asset != []:
                # Include non-empty folders in the list of scraped assets
                review_link_list.append(rev_lnk_asset)
                scrape_review_link_data(client, rev_lnk_asset["id"], review_link_assets)

            if rev_lnk_asset[0][0]["asset"]["type"] == "file":
                review_link_list.append(rev_lnk_asset)

            if rev_lnk_asset[0][0]["asset"]["type"] == "version_stack":
                versionz = items_in_review_link(rev_lnk_asset["id"])
                review_link_list.append(rev_lnk_asset)
                for vz_asset in versionz:
                    review_link_list.append(vz_asset)


    return review_link_list

items_in_review_link
``` is a list of dictionaries I created and is part of the original solve for this. However, when I run the function I get a "Key Error 0." 

The way I have it now, I scrape the assets in a project. And then I find all the review links in a project. And then I go through and try to match them up. I would really love to just have a function tell me what assets belong with which review links. 

Any help would be appreciated! 
Nancy

jhodges · June 7, 2022, 8:02pm

Hiya!

I put together a code sample yesterday for scraping assets from review links. It’s not 100% what you’re looking for yet, but I think it’ll help you better understand what it will take.

One thing I might add later today is a “flattener” function to flatten the nested children here.

Unfortunately there is no way via our API to just “check” if a given asset belongs to any review links at this time.

from pprint import pprint
from frameioclient import FrameioClient
from typing import Dict, List

def scrape_review_link_data(
    client: FrameioClient,
    review_link_id: str,
) -> List[Dict]:
    """
    Takes an initialized client and an asset_id or project_id maybe representing a position in a directory tree.
    Recursively builds a list of review link data, maybe.  Returns a list of nested dicts.
    """
    full_asset_list = []

    review_link_root_assets = client.review_links.get_assets(review_link_id)

    for folder in review_link_root_assets:
        temp_assets = client.helpers.get_assets_recursively(folder['asset_id'])
        for asset in temp_assets:
            full_asset_list.append(asset)

    for rev_lnk_asset in full_asset_list:
        # pprint(rev_lnk_asset)
        print(f"Type: {rev_lnk_asset['type']}, Name: {rev_lnk_asset['name']}")

if __name__ == "__main__":
    token = "[YOUR_TOKEN]"
    review_link_id = "62dee239-4678-d0db-9ae9-79a8e6e9eea4"

    client = FrameioClient(token)
    scrape_review_link_data(client, review_link_id)

If you’d like, you can grab some time with me via my Calendly to go over in more detail what it is you’re trying to achieve, and maybe I can help!

NanMelch · June 29, 2022, 2:31pm

Thanks again, Jeff.
This is slowly getting me there. I throw an error eventually with the “type” key and I think I’m missing the correct cutoff in my code. But for the theory for what I’m trying to accomplish, this helps out so much.

jhodges · July 21, 2022, 11:39pm

Glad I could help! My guess is the type issue might be related to running into a version_stack which would be different than a file or folder type.

NanMelch · August 12, 2022, 10:54pm

Hi @jhodges (and everyone else)!
I was trying to create some review links and used the code in the API page; the review link was created, but it didn’t have anything in it! Do I need to follow up creating the link with putting assets in it? I get that presentation links and review links are very different on the back-end, so I wanted to check this out before I went down a rabbit hole of misunterstanding because I created presentation links using the p-link code on the API page and it worked like a dream!

Thanks!

jhodges · August 25, 2022, 9:41pm

That’s correct, once you create the review link in a given project, you then need to make additional API calls to add items to it.

Here’s the documentation page for review links, and the API reference for that endpoint.

Topic		Replies	Views
Seeing versions in review links Help	1	716	September 14, 2021
API information regarding download activity for review links and assets General python , api	1	498	April 7, 2023
Number of review links in a project Help javascript , api	0	68	August 30, 2024
Scraping and downloading assets for a given project Help python , api	1	768	September 14, 2022
Full scope required for review links in developer token Help python , review-links	5	483	October 19, 2024

Scraping Review Link information

Related topics