Hello,
I’m working on a script to scrape the data in review links so I can share the assets that are in review links on a csv with my team.
I’ve gotten to the point where I have a list of dictionaries with all the information I want. I’m quite sure I’m just missing a glaringly obvious thing with iterating through lists and dictionaries, but I’ve been starting at the same thing for so long, I thought I would ask for some assistance.
Looking at this info, I’m hoping to dig into the children and get a few pieces for the spreadsheet:
children_in_review_link = {}
for i in range(len(items_in_review_link)):
#pprint.pprint(items_in_review_link[i])
if items_in_review_link[i][0]['asset']['children'] != []:
review_link_children.append(items_in_review_link[i][0]['asset']['children'])
print(items_in_review_link[i][0]['asset']['children'])
for j in range(len(items_in_review_link[i][0]['asset']['children'])):
children_in_review_link = {
'child_name': items_in_review_link[i][0]['asset']['children'][j]['name'],
'child_type': items_in_review_link[i][0]['asset']['children'][j]['type'],
'child_size': items_in_review_link[i][0]['asset']['children'][j]['filesize'],
'child_filetype': items_in_review_link[i][0]['asset']['children'][j]['filetype'],
}
review_link_children.append(children_in_review_link)
pprint.pprint(review_link_children)
and it indeed does not work! I get a “list out of range” error for the line if items_in_review_link[i][0]['asset']['children'] != []:
I had put in the pprint.pprint(items_in_review_link[i][0] line as a check, and it kicks back a “list index out of range” error on the last item in the range.
Any direction / help would be greatly appreciated!
Are all the assets in a list under the children key?
If so could you just run something like this?
review_link_children = []
for item in overall_review_payload:
children_list = item['asset]['children']
for child in children_list:
try:
children_in_review_link = {'child_name': child['name'], 'child_type': child['type']}
except KeyError as e:
print("Doing something with my key errors")
if children_in_review_link:
review_link_children.append(children_in_review_link)
Not sure of any of the quirks you’re seeing in the payload, but this seems like the simple way of pulling those data points and placing them in a list
Maybe you could add something to check if the children key list isn’t empty too
Just wanted to say an actual thanks for helping out.
I discovered that there was a broken review link and that’s why I kept getting range errors.
My next step will be a more sophisticated try / except for finding those!
A couple questions:
-Is there a check I can run at the start of the loop to look for / skip broken links?
-Is it possible to do the same type of scraping for presentation links? I went to the API Reference Docs (https://developer.frame.io/api/reference/) and received 404 File Not Found though I had remembered there was a "https://api.frame.io/v2/projects/" + project_id + "/presentations" call available. From there, is there a way to scrape the presentation link data to see what assets are associated with the link?
Hi again,
I’m still plugging away at trying to scrape review link info. I have an ok solution, but was hoping to really scrape what’s in the review links specifically. I attempted to copy the asset scraper function, but to no avail.
My function is this:
def scrape_review_link_data(
client: FrameioClient,
rev_lnk_asset_id: str, #is this project_id instead for this?
review_link_list: List[Dict]
) -> List[Dict]:
"""
Takes an initialized client and an asset_id or project_id maybe representing a position in a directory tree.
Recursively builds a list of review link data, maybe. Returns a list of dicts.
"""
review_link_assets = items_in_review_link
review_link_list = []
for rev_lnk_asset in review_link_assets:
if rev_lnk_asset[0][0]["asset"]["type"] == "folder" and rev_lnk_asset != []:
# Include non-empty folders in the list of scraped assets
review_link_list.append(rev_lnk_asset)
scrape_review_link_data(client, rev_lnk_asset["id"], review_link_assets)
if rev_lnk_asset[0][0]["asset"]["type"] == "file":
review_link_list.append(rev_lnk_asset)
if rev_lnk_asset[0][0]["asset"]["type"] == "version_stack":
versionz = items_in_review_link(rev_lnk_asset["id"])
review_link_list.append(rev_lnk_asset)
for vz_asset in versionz:
review_link_list.append(vz_asset)
return review_link_list
items_in_review_link
``` is a list of dictionaries I created and is part of the original solve for this. However, when I run the function I get a "Key Error 0."
The way I have it now, I scrape the assets in a project. And then I find all the review links in a project. And then I go through and try to match them up. I would really love to just have a function tell me what assets belong with which review links.
Any help would be appreciated!
Nancy
I put together a code sample yesterday for scraping assets from review links. It’s not 100% what you’re looking for yet, but I think it’ll help you better understand what it will take.
One thing I might add later today is a “flattener” function to flatten the nested children here.
Unfortunately there is no way via our API to just “check” if a given asset belongs to any review links at this time.
from pprint import pprint
from frameioclient import FrameioClient
from typing import Dict, List
def scrape_review_link_data(
client: FrameioClient,
review_link_id: str,
) -> List[Dict]:
"""
Takes an initialized client and an asset_id or project_id maybe representing a position in a directory tree.
Recursively builds a list of review link data, maybe. Returns a list of nested dicts.
"""
full_asset_list = []
review_link_root_assets = client.review_links.get_assets(review_link_id)
for folder in review_link_root_assets:
temp_assets = client.helpers.get_assets_recursively(folder['asset_id'])
for asset in temp_assets:
full_asset_list.append(asset)
for rev_lnk_asset in full_asset_list:
# pprint(rev_lnk_asset)
print(f"Type: {rev_lnk_asset['type']}, Name: {rev_lnk_asset['name']}")
if __name__ == "__main__":
token = "[YOUR_TOKEN]"
review_link_id = "62dee239-4678-d0db-9ae9-79a8e6e9eea4"
client = FrameioClient(token)
scrape_review_link_data(client, review_link_id)
If you’d like, you can grab some time with me via my Calendly to go over in more detail what it is you’re trying to achieve, and maybe I can help!
Thanks again, Jeff.
This is slowly getting me there. I throw an error eventually with the “type” key and I think I’m missing the correct cutoff in my code. But for the theory for what I’m trying to accomplish, this helps out so much.
Hi @jhodges (and everyone else)!
I was trying to create some review links and used the code in the API page; the review link was created, but it didn’t have anything in it! Do I need to follow up creating the link with putting assets in it? I get that presentation links and review links are very different on the back-end, so I wanted to check this out before I went down a rabbit hole of misunterstanding because I created presentation links using the p-link code on the API page and it worked like a dream!