Scraping Review Link information

Hello,
I’m working on a script to scrape the data in review links so I can share the assets that are in review links on a csv with my team.

I’ve gotten to the point where I have a list of dictionaries with all the information I want. I’m quite sure I’m just missing a glaringly obvious thing with iterating through lists and dictionaries, but I’ve been starting at the same thing for so long, I thought I would ask for some assistance.

Looking at this info, I’m hoping to dig into the children and get a few pieces for the spreadsheet:

I had this little loop going:

children_in_review_link = {}
for i in range(len(items_in_review_link)):
    #pprint.pprint(items_in_review_link[i])
    if items_in_review_link[i][0]['asset']['children'] != []:
        review_link_children.append(items_in_review_link[i][0]['asset']['children'])
        print(items_in_review_link[i][0]['asset']['children'])
        for j in range(len(items_in_review_link[i][0]['asset']['children'])):
            children_in_review_link = {
                'child_name': items_in_review_link[i][0]['asset']['children'][j]['name'],
                'child_type': items_in_review_link[i][0]['asset']['children'][j]['type'],
                'child_size': items_in_review_link[i][0]['asset']['children'][j]['filesize'],
                'child_filetype': items_in_review_link[i][0]['asset']['children'][j]['filetype'],
            }
            review_link_children.append(children_in_review_link)

pprint.pprint(review_link_children)

and it indeed does not work! I get a “list out of range” error for the line if items_in_review_link[i][0]['asset']['children'] != []:

I had put in the pprint.pprint(items_in_review_link[i][0] line as a check, and it kicks back a “list index out of range” error on the last item in the range.

Any direction / help would be greatly appreciated!

Nancy

Are all the assets in a list under the children key?

If so could you just run something like this?

review_link_children = []
for item in overall_review_payload:
     children_list = item['asset]['children']
     for child in children_list:
               try:
                   children_in_review_link = {'child_name': child['name'], 'child_type': child['type']}
               except KeyError as e:
                   print("Doing something with my key errors")
                if children_in_review_link:
                   review_link_children.append(children_in_review_link)

Not sure of any of the quirks you’re seeing in the payload, but this seems like the simple way of pulling those data points and placing them in a list

Maybe you could add something to check if the children key list isn’t empty too

(ignore the indents and formatting btw… :grin:)

1 Like

Just wanted to say an actual thanks for helping out.
I discovered that there was a broken review link and that’s why I kept getting range errors.
My next step will be a more sophisticated try / except for finding those!

That would definitely do it!

Feel free to follow-up on this thread if you’re still having issues, and I can maybe throw together some more code samples for you.

1 Like

Thanks!

A couple questions:
-Is there a check I can run at the start of the loop to look for / skip broken links?
-Is it possible to do the same type of scraping for presentation links? I went to the API Reference Docs (https://developer.frame.io/api/reference/) and received 404 File Not Found though I had remembered there was a "https://api.frame.io/v2/projects/" + project_id + "/presentations" call available. From there, is there a way to scrape the presentation link data to see what assets are associated with the link?

Thanks for your help!

Heya!

There sure is, check out this API endpoint. You’d be looking in the response for the presentation_items object.

1 Like

Hi again,
I’m still plugging away at trying to scrape review link info. I have an ok solution, but was hoping to really scrape what’s in the review links specifically. I attempted to copy the asset scraper function, but to no avail.

My function is this:

def scrape_review_link_data(
    client: FrameioClient,
    rev_lnk_asset_id: str, #is this project_id instead for this?
    review_link_list: List[Dict]
) -> List[Dict]:
    """
    Takes an initialized client and an asset_id or project_id maybe representing a position in a directory tree.
    Recursively builds a list of review link data, maybe.  Returns a list of dicts.
    """
    review_link_assets = items_in_review_link
    review_link_list = []

    for rev_lnk_asset in review_link_assets:

            if rev_lnk_asset[0][0]["asset"]["type"] == "folder" and rev_lnk_asset != []:
                # Include non-empty folders in the list of scraped assets
                review_link_list.append(rev_lnk_asset)
                scrape_review_link_data(client, rev_lnk_asset["id"], review_link_assets)

            if rev_lnk_asset[0][0]["asset"]["type"] == "file":
                review_link_list.append(rev_lnk_asset)

            if rev_lnk_asset[0][0]["asset"]["type"] == "version_stack":
                versionz = items_in_review_link(rev_lnk_asset["id"])
                review_link_list.append(rev_lnk_asset)
                for vz_asset in versionz:
                    review_link_list.append(vz_asset)


    return review_link_list

items_in_review_link
``` is a list of dictionaries I created and is part of the original solve for this. However, when I run the function I get a "Key Error 0." 

The way I have it now, I scrape the assets in a project. And then I find all the review links in a project. And then I go through and try to match them up. I would really love to just have a function tell me what assets belong with which review links. 

Any help would be appreciated! 
Nancy

Hiya!

I put together a code sample yesterday for scraping assets from review links. It’s not 100% what you’re looking for yet, but I think it’ll help you better understand what it will take.

One thing I might add later today is a “flattener” function to flatten the nested children here.

Unfortunately there is no way via our API to just “check” if a given asset belongs to any review links at this time.

from pprint import pprint
from frameioclient import FrameioClient
from typing import Dict, List

def scrape_review_link_data(
    client: FrameioClient,
    review_link_id: str,
) -> List[Dict]:
    """
    Takes an initialized client and an asset_id or project_id maybe representing a position in a directory tree.
    Recursively builds a list of review link data, maybe.  Returns a list of nested dicts.
    """
    full_asset_list = []

    review_link_root_assets = client.review_links.get_assets(review_link_id)

    for folder in review_link_root_assets:
        temp_assets = client.helpers.get_assets_recursively(folder['asset_id'])
        for asset in temp_assets:
            full_asset_list.append(asset)

    for rev_lnk_asset in full_asset_list:
        # pprint(rev_lnk_asset)
        print(f"Type: {rev_lnk_asset['type']}, Name: {rev_lnk_asset['name']}")

if __name__ == "__main__":
    token = "[YOUR_TOKEN]"
    review_link_id = "62dee239-4678-d0db-9ae9-79a8e6e9eea4"

    client = FrameioClient(token)
    scrape_review_link_data(client, review_link_id)

If you’d like, you can grab some time with me via my Calendly to go over in more detail what it is you’re trying to achieve, and maybe I can help!

1 Like

Thanks again, Jeff.
This is slowly getting me there. I throw an error eventually with the “type” key and I think I’m missing the correct cutoff in my code. But for the theory for what I’m trying to accomplish, this helps out so much.