Projects with default Git integration

In a project with default Git integration, you always have your own view of the project based on the contents of your local Git clone. All project assets that are listed in the project reflect the current state of your Git clone.

Since you work in your local Git clone, the same Git repository can be associated with different projects within a single instance of the experience. It can also be associated with projects across multiple instances of the experience.

There is no restriction on the directory structure for code in the Git repository, nor where and how changes are made.

Collaboration

If you want to work with others on the same contents of files in a particular Git repository, you can add those users as collaborators to your project. Those users do not have to create their own projects based on the same Git repository. They can work and test in their own clones of the repository and then merge their changes when their code is ready. By adding collaborators to your project, you can easily track who is working on it. This means that you don’t need to check the Git user interface to see who is committing changes.

In addition to being added as collaborators, users must have their own access token for the associated repository.

  1. Add users as collaborators to the project and assign them either Admin or Editor role. You can invite only users who have an existing IBM Cloud Pak for Data account. See Adding collaborators.
  2. Give all collaborators the appropriate access permissions to the Git repository.
  3. Collaborators are asked to create and submit their own personal access token when they pull the Git branch for their local clone. See Creating personal access tokens for a Git repository.

Tools and assets that you can use in projects with default Git integration

Note: JupyterLab and RStudio assets are visible only in the IDE, not in the assets table.
Tools support
Tool Support in default Git projects Support for project import
AutoAI (Watson Machine Learning) Note: AutoAI does not support pushing to the remote repository.
Data Refinery
Decision Optimization
JupyterLab Note: Use JupyterLab to create and manage notebooks.
RStudio Note: Use RStudio to create and manage notebooks.
SPSS Modeler
Synthetic Data Generator
Orchestration Pipelines
Assets support
Asset Support in default Git projects Support for project import
Connections ✓ See Connecting to data sources
Connected data
Connected folder assets
Decision Optimization experiments
Deep Learning experiments
Data assets
Data Refinery flows
Jobs
SPSS Modeler flows
Synthetic Data Generator flows
Models from file
Visualizations

You can’t perform any of the following actions in projects with default Git integration:

  • Deploy to space
  • Export project
  • Import assets into non-empty project

If you work in a project with default Git integration, the Git repository might contain assets that are added from another project that use the same Git repository and you cannot work with all assets that are pulled from a Git repository. For more information, see Troubleshooting for Watson Studio.

By selecting Local Git data, you can create data assets from any file you pick in your local clone. For example, if you run a notebook that generates a .csv file, you can use this to make it a data asset that you can then refine by using Data Refinery.

If you save a Data Refinery flow for example, not only is a .flow file saved that contains the flow itself, but the project creates an asset that points to that flow and that allows you to have metadata for that asset. If you upload a data file, the file is uploaded to the project data folder and a Data asset is also created for that file.

Project assets and their metadata are stored in the following well-defined locations inside the Git repository:

  • assettypes: contains a set of JSON files that define the types and other characteristics of the assets. The set of files, if any, that exists in this folder depends on the set of services that have been installed.

  • assets: contains any files relevant to the asset and a metadata file with the user-specified information (like a description). There is a folder for each type of asset, with a .METADATA folder that contains the JSON files with the metadata.

    For example, for a data asset and a saved model, you would see:

    assets/.METADATA
    assets/.METADATA/wml_model.mymodel1.json
    assets/.METADATA/data_asset.cars.json
    assets/data_asset/cars.csv
    assets/wml_model/mymodel1/7ca4e02d-fe0b-4832-921e-448bf05f435e
    assets/wml_model/mymodel1/3bbb4b08-2d84-4099-8d90-7e9f4fb496f5
    

    You can edit files, for example the metadata JSON files, to update the description of an asset. However, you must be cautious when you're editing these files as the metadata required for each type of asset is different and not documented, which could result in unexpected behavior if your changes are not valid. You should never manually delete files in these directories. Instead, delete assets only by using the project user interface.

    There is no auto discovery for newly added assets. For example, if you add valid model files to ./wml_model (and don't use the project user interface), the models will not be registered as assets in the project.

    When you push updates to the external Git Repository, always include all files under the directories assettypes and assets including assets/.METADATA. These files are needed to manage project assets consistently for all collaborators in all the Git branches.

Notebooks and scripts

Notebooks and scripts are not project assets in a default Git project and do not have associated metadata that is maintained by Watson Studio. Instead, notebooks and scripts are arbitrary code files. There is also no asset versioning inside a default Git project. Version control is done through the versioning inherent in the Git repository.

You develop and test notebooks and scripts in Jupyterlab and RStudio. There is no restriction on the Git directory structure that you use, nor on the Git operations you perform.

Also, you have full control of the .gitignore file contents in your clone for files you don't want to persist in the Git repository. A default .gitignore file is included at the time that you create the project that ignores core files and job run information (metadata file and logs, like assets/.METADATA/job_run.* and assets/job_run files). If you want to ignore other files, you should add those files to the default .gitignore file and not use your own .gitignore file.

Python functions are currently not supported in projects with default Git integration.

Jobs for scripts or notebooks

You can create a job from the Jobs page of your project by selecting New job. Then, browse the script or notebook you want to use as the entry point for the job.

When the job starts, the full contents of your Git clone are available (mounted). The notebook or script you selected as the entry point call any other scripts or notebooks in your clone, which in turn call other files in the project. See Creating code-based jobs.