Version Date: 2020 Mar 10
All of the notebooks in these courses are written to run locally on your computer running a Jupyter notebook server. If you wish to run the notebooks in Watson Studio in the IBM Cloud, you will need to add some modifications to each notebook.
Why? Because once you import a course notebook and the data files for that notebook into a Watson Studio project, the data files are no longer available to the notebook!
This is simply due to the fact that the imported data files are stored in an IBM Cloud Object Storage (COS) bucket. The notebook does not have access to those objects in the COS bucket. Thus, if you import a notebook and its data files into a Studio project then try to run it, the notebook will return "File not found" errors.
In order to make the data files available to your notebook, you will need to run some code in your notebook to:
This whole process LOOKS complicated but it really isn't. After you've done it a couple of times, it can be done quickly and easily without looking at the instructions.
What you'll be doing here is putting code into your notebook that simply gets a named file from a Cloud Object Storage (COS) bucket, puts in a byte stream, then creates a directory and file to write that byte stream to.
The most confusing part of all of this is the relative locations of the directories that you are creating in the virtual disk used by your notebook. Keep these points in mind:
/home/dsxuser/work
.
/home/dsxuser
.
../data/aavail-customers.csv
) or an absolute path (e.g., /home/dsxuser/data/aavail-customers.csv
).
Step 0: These steps assume you have already done ALL of the following:
Step 1: From the "My Projects" page in Watson Studio Cloud, click on the link for the project containing your notebook and data files
Step 2: On the project overview page, click on the link "Settings" located near the top center of the page.
Step 3: On the Settings page, scroll down to section "Access Tokens"
Step 4: On the right hand side of the Access Tokens section, click the + sign link for "New Token"
Step 5: On the New Token popup, enter a name for your token (e.g. "Tokey McTokenface") and set "Access role for project" to "Editor"
Step 6: Click the button "Create"
Step 7: Scroll back to the top of the page and click on link "Assets"
Step 8: On the Assets page locate the row listing the notebook you wish to work with and click on the pencil icon on the right side of the row to edit the notebook.
Step 9: On the project menu, locate the More menu item (indicated by three dots stacked on top of each other) near the top right of the page and click on it.
Step 10: Click on the menu item "Insert Project Token"
You should now see code inserted to your notebook that resembles this:
Step 11: Insert a new cell under the project token cell shown above and paste the following code into it:
# START CODE BLOCK
# cos2file - takes an object from Cloud Object Storage and writes it to file on container file system.
# Uses the IBM project_lib library.
# See https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/project-lib-python.html
# Arguments:
# p: project object defined in project token
# data_path: the directory to write the file
# filename: name of the file in COS
import os
def cos2file(p,data_path,filename):
data_dir = p.project_context.home + data_path
if not os.path.exists(data_dir):
os.makedirs(data_dir)
open( data_dir + '/' + filename, 'wb').write(p.get_file(filename).read())
# file2cos - takes file on container file system and writes it to an object in Cloud Object Storage.
# Uses the IBM project_lib library.
# See https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/project-lib-python.html
# Arguments:
# p: prooject object defined in project token
# data_path: the directory to read the file from
# filename: name of the file on container file system
import os
def file2cos(p,data_path,filename):
data_dir = p.project_context.home + data_path
path_to_file = data_dir + '/' + filename
if os.path.exists(path_to_file):
file_object = open(path_to_file, 'rb')
p.save_data(filename, file_object, set_project_asset=True, overwrite=True)
else:
print("file2cos error: File not found")
# END CODE BLOCK
This code contains two user defined Python functions.
cos2file
: Get an asset stored in Cloud Object Storage (COS) and write the data to a disk file in the notebook container.
file2cos
: Get a disk file in the notebook container and write to an asset stored in Cloud Object Storage (COS)
The arguments for both of these functions are:
p
: The project object defined by the project token, called "project" by default.
data_path
: the directory path relative to /home/dsxuser
that will contain the files.
filename
: the filename that the data should be written to or read from.
Step 12: Create a new empty cell under the one created in Step 11 and enter code to call the function cos2file()
to retrieve a file from Cloud Object Storage.
You are given a notebook called m2-u2-data-visualization.ipynb. That notebooks makes use of three files:
../data/aavail-customers.csv
.
../data/customer-stream-data.db
.
../scripts/make-pretty-graph.py
.
First, create the project in Watson Studio, import the notebook, store the three files aavail-customers.csv, customer-stream-data.db, and make-pretty-graph.py as assets in the same project, and create a new project token.
Open the notebook in Edit mode and insert the project token as described above, then paste the code from Step 11 above into a new cell under the one containing the project token.
Create another new cell under the Step 11 code cell and enter the following code:
cos2file(project, '/data', 'aavail-customers.csv')
cos2file(project, '/data', 'customer-stream-data.db')
cos2file(project, '/scripts', 'make-pretty-graph.py')
These three calls to the cos2file()
function will create the directories /home/dsxuser/data
and /home/dsxuser/scripts
and write the files aavail-customers.csv
and customer-stream-data.db
to the directory /home/dsxuser/data
and the file make-pretty-graph.py
will be written to the directory /home/dsxuser/scripts
.
Some notebooks in this course are written to store files in directories that don't exist yet in Watson Studio. If you ever need to create an empty directory that your notebook can store things to, just use the cos2file()
function to store any of your data assets into a directory named according to what is needed.
For example, calling the function with cos2file(project, '/images', 'aavail-streams.csv')
will create the directory /home/dsxuser/images
and write the CSV file to it. But you can ignore that CSV file and the notebook can then write and read to the new /home/dsxuser/images
directory.