Projects with default Git integration

In an analytics project with default Git integration, you always have your own view of the project based on the contents of your local Git clone. All project assets that are listed in the project reflect the current state of your Git clone.

Since you work in your local Git clone, the same Git repository can be associated with different analytics projects across a single Cloud Pak for Data instance and across multiple Cloud Pak for Data instances.

There is no restriction on the directory structure for code in the Git repository, nor where and how changes are made.

Collaboration

If you want to work with others on the same contents of files in a particular Git repository, you can add those users as collaborators to your project. Those users do not have to create their own projects based on the same Git repository. They can work and test in their own clones of the repository and then merge their changes when their code is ready. By adding collaborators to your project, you can easily track who is working on the same project without having to go to the Git user interface to see who is committing changes.

To enable sharing when working on files, users must be added to the project as collaborators and must have their own access token for the associated repository.

  1. Add users as collaborators to the project and assign them either Admin or Editor role. You can only invite users who have an existing IBM Cloud Pak for Data account. See Adding collaborators.
  2. Give all collaborators the appropriate access permissions to the Git repository.
  3. Collaborators are asked to create their own personal access token when they pull the Git branch for their local clone. See Creating personal access tokens for a Git repository.

Assets in projects with default Git integration

Asset types that you add to the project from the project's action bar are all project assets. You can add the following assets to a project with default Git integration:

Credentials in connections are implicitly treated as Personal credentials. There is no support for Shared credentials.

Note that you can't add assets from a catalog to a project with default Git integration.

By selecting Local Git data, you can create data assets from any file you pick in your local clone. For example, if you run a notebook that generates a .csv file, you may use this to make it a data asset that you can then refine using Data Refinery.

If you save a Data Refinery flow for example, not only is a .flow file saved that contains the flow itself, but the project creates an asset that points to that flow and that allows you to have metadata for that asset. If you upload a data file, not only is the file uploaded to the project data folder but a Data asset is also created for that file.

Project assets and their metadata are stored in the following well defined locations inside the Git repository:

Notebooks and scripts

Notebooks and scripts are not project assets in a default Git project and have no associated metadata maintained by Watson Studio. Instead notebooks and scripts are arbitrary code files. There is also no asset versioning inside a default Git project. Version control is done through the versioning inherent in the Git repository.

You develop and test notebooks and scripts in Jupyterlab and RStudio. There is no restriction on the Git directory structure you use, nor on the Git operations you perform.

Additionally, you have full control of the contents of the .gitignore file in your clone for files you don't want to persist in the Git repository. A default .gitignore file is included at the time you create the project that ignores core files and job run information (metadata file and logs, like assets/.METADATA/job_run.* and assets/job_run files). If you want to ignore other files, you should add those files to the default .gitignore file and not use your own .gitignore file.

Note that Python functions are currently not supported in projects with default Git integration.

Jobs for scripts or notebooks

You can create a job from the Jobs page of your project by selecting New job and browsing the script or notebook that you want to use as the entry point for the job.

When the job starts to run, the full contents of your Git clone is available (is mounted), which means that the notebook or script that you selected as the entry point can call any other scripts or notebooks in your clone, which in turn can call other files in the project. See Creating code-based jobs.

Parent topic: Projects