Skip to main content

Please know that we will cloning this site from August 3rd to August 15th. Any content changes you make will not be saved past that date. Please do not make any updates to the site on August 3rd through the 15th. If you have any questions please contact your site administrator. 

Gainsight Inc.

S3 Dataset Task in Bionic Rules

This tutorial provides guidelines on how Admins can create a dataset by fetching data from a csv/tsv file in any bucket. These csv/tsv files can be uploaded manually into the S3 bucket or exported into the S3 bucket from another Bionic Rule task. This task allows Admins to create fields in the new dataset from the csv/tsv columns. Once the dataset is created, Admins can apply other tasks and set up actions on it.

This task can fetch data from a csv/tsv file that was exported to the Gainsight managed S3 bucket from another Bionic rule task. For more information, refer Export to S3 from Bionic Rules

Following are the use cases where you can use this Bionic Rule task:

  • You can fetch data from an historical csv/tsv file in the Gainsight managed or any S3 bucket if you know the exact file name, using the Equals option in the task configuration. (Ex: CompanyDetails.csv or CompanyDetails-2018-02-20.csv). If you know the format of the file name (which contains the generated date in the file name) that was exported to S3 bucket, you can select a file that was generated on a specific date using the Date Pattern option in this task configuration.
  • While exporting output csv/tsv files into the S3 bucket from another Bionic Rule task, Admins could have configured to export a set of smaller files instead of a big file (Ex: larger than 10 GB). In this use case, use the option Starts with in the task configuration and fetch data from all of the smaller files in the S3 bucket and load into the S3 Dataset.

For more details on the task configurations, refer the section S3 File Configuration.

Prerequisites

  1. To create an S3 dataset from a csv/tsv file loaded into an S3 bucket, create a connection in the Connectors 2.0 page as shown below:
    1. Navigate to Administration > Connectors 2.0 > Connectors tab.
    2. Click Create Connection on the Amazon Web services S3 Connector. Create a Connection dialog appears.
    3. Enter the following:
  • Connection Name: Example, My S3 Custom Bucket
  • Credentials of your S3 bucket like Bucket Name, Access Key, and Security Token

    Make sure that you enter valid credentials to establish an S3 connection.
  1. Click TEST CONNECTION to validate the S3 connection. If the credentials for S3 bucket are correct and valid, it displays the success message, Connection Successful. If not, it displays an error message, Connection failed.
  2. Click CREATE. A connection for your S3 bucket is established and the same appears in the S3 File Configuration section in the Setup Rule page for S3 Dataset.

S3 Connection in Connectors 2.0.gif

  1. Before an S3 dataset is created, make sure that the source csv/tsv file is available in the Gainsight Managed S3 bucket.
  2. Ensure that Date and DateTime values in the csv/tsv file are in one of the formats listed in the Date Configuration section.

Creating an S3 Dataset Task

To create a S3 Dataset Task:

  1. Navigate to Administration > Rules Engine. Click + RULE. Create Rule page appears.
  2. Provide the following in the Create Rule page:
    1. Select Bionic as Rule Type.
    2. Select Account as Rule For.
    3. Enter Rule Name.
    4. Enter Description [Optional].
  3. Click NEXT. Setup Rule page appears.
  4. Click +TASK > S3 Dataset. S3 Dataset configuration page appears. This page has three sections, S3 File Configuration, Columns, and Date Configuration.

Setup Rule.gif

  1. Enter the Task Name, Task Description, and Output Dataset Name in the Setup Rule header.

Note: Output Dataset Name auto populates from Task Name and it can be changed to a different name. It has no dependency on the Task Name.

In this use case, the following details are used:

  • Task Name: Extract from S3 [Maximum 80 characters and should be Alphanumeric; _ and space are supported]
  • Task Description: Extract data from a file in the S3 bucket [Maximum 200 characters]
  • Output Dataset Name: Extract from S3 [Maximum 60 characters and should be Alphanumeric; _ and space are supported]. This gets pre-populated with the task name by default.

S3_Dataset.png

The S3 Dataset page has three sections, S3 File Configuration, Columns, and Date Configuration as shown below. Admins can configure this page to setup an S3 extract job. Depending on this configuration, columns in the csv/tsv file are extracted to fields in the S3 dataset.

S3 File Configuration

Perform the following configurations in this section:

  1. Select correct S3 bucket: 
  • Gainsight Managed, if you want to fetch a csv/tsv file from the Gainsight Managed bucket
  • Your S3 custom bucket, if you want to fetch a csv/tsv file from your S3 custom bucket

Note: You can see an S3 bucket here for which you have established an S3 connection in the Connectors 2.0 page as shown in the Prerequisites.

For a quick demo on how to create a connection for an S3 bucket and use the connection in this section, watch this short video.

  1. In the File Path field, you have the following options for the file name:
    1. Equals: If you know the exact file name in the S3 bucket with csv/tsv extension (Example: CompanyDetails.csv or CompanyDetails-2018-02-20.csv), use the option Equals to select that specific file for data loading into the S3 dataset. You can use this option to load an historical file from the S3 bucket. If your csv/tsv file has .gzip or .bzip as extension, select the specific Compression Type in the file properties. Select the other file properties as required for smooth data loading. However, Gainsight supports the following file properties:
  • Field Separator: , (Comma) as separator for the csv files and (Tab) for the tsv files
  • Text Qualifier: “ (Double Quote) or ‘ (Single Quote)
  • Escape Character: \ (Backslash), ‘ (Single Quote), or “ (Double Quote)
  • Compression Type: gzip or bzip
  1. Starts with: You can use this option only when you select the Use Date Pattern checkbox. This option can be used in combination with Date Pattern in the file name. If you enter the partial file name with date pattern, the specific file is selected for creating S3 dataset. You should select the file properties like Field Separator and Compression Type to select the correct file from the S3 bucket. You can use this option if you want to ingest data from multiple files of similar file names into the S3 Dataset.

    While exporting output csv/tsv files into the S3 bucket from another Bionic Rule task, if the file size is big (Ex: larger than 10 GB), Admins can configure to divide this big file into a set of smaller files with similar file names (Ex: file 1 of 1.5 GB, file 2 of 2 GB, etc.) and export into the S3 bucket. These smaller files have the same columns and the records are divided into multiple files. While creating an S3 Dataset using the option Starts with, records from all of the smaller files are fetched into the S3 dataset.
  1. Use Date Pattern: Enable this to use Date Pattern in the file path. If you select this, ${pattern} appears in the file name and it can be placed anywhere in the file path. Enter the number of days to subtract from the rule date in the ${pattern} formula. Date pattern in the file name can be applied with both the Equals and Starts with options in the File Path field.

Note: Position of the Date Pattern in the file name should correspond with the name of the files exported to the S3 bucket.

  1. You can use the following combination of File Path and Date Pattern options for the mentioned use cases:

File Path: In all the below conditions, make sure that you enter full file path from S3 bucket. For example, if a file named "CompanyDetails.csv" was placed in the folder MDA-Data-Ingest/input/ of a given S3 Bucket, and if you use the option Equals, the file path would be MDA-Data-Ingest/input/CompanyDetails.csv.

  1. Equals: You can use this option only if you know the exact file name (with/without file generated date) in the S3 bucket with csv/tsv extension (Example: CompanyDetails.csv or CompanyDetails-2018-02-20.csv). The file path would be MDA-Data-Ingest/input/CompanyDetails.csv.
  2. Equals + Date Pattern: You can use this option if you know the file name without the file generated date. You can add Date Pattern to select a specific file for data ingestion. For example, if the Rule execution date is 2018-02-24 and you want to ingest data from a file exported to S3 bucket on 2018-02-20. You can subtract 4 days from Rule Date. The file name will be CompanyDetails-${pattern}.csv where ${pattern} = Subtract 4 days from Rule Date with format yyyy-MM-dd and the file path will be MDA-Data-Ingest/input/CompanyDetails-${pattern}.csv.
  3. Starts with + Date Pattern: You can use this option if you know the file name partially without file generated date. You can add Date Pattern to select a file or a set of files that match the file path for data ingestion. For example, if the Rule execution date is 2018-02-24 and if you want to ingest data from a file or multiple files that match the file path exported to S3 bucket on 2018-02-20. You can subtract 4 days from the Rule Date. The file name will be CompanyDetails-${pattern} where ${pattern} = Subtract 4 days from Rule Date with format yyyy-MM-dd and the file path will be MDA-Data-Ingest/input/CompanyDetails-${pattern}.

S3 File Configuration.gif

  1. When you select the option Equals, make sure to add the extensions .csv/.tsv and .bzip/.gz for the file format (Comma separated/Tab separated) and compression type respectively.
  2. Is the source file Encrypted: If your csv file in the S3 bucket is encrypted, select the checkbox and correct PGP key to apply decryption on the csv/tsv file.

Note: To configure a PGP key for decryption with Gainsight, contact support@gainsight.com so that you can use the same here.

  1. When you select the option Equals or Starts with, make sure to select the following File Properties as required:
  • Field Separator: Comma / Tab for the .csv/.tsv files respectively
  • Compression Type: bzip / gzip for the file compression types respectively. Select None if the compression is not applied on the source file in the S3 bucket.

If you do not apply the above configurations correctly or you have provided incorrect S3 bucket details while creating an S3 connection in the Connectors 2.0 page, the Rule execution fails.

Notes:

  • Field Separator: Use , (Comma) as separator for the csv files and (Tab) for the tsv files.
  • Text Qualifier: It is used to import a value (along with special characters) specified in the Quotation while importing data. It is recommended to use the same Text Qualifier in the S3 file configuration which is used in the csv/tsv file to upload. By default, Double Quote is selected in the S3 file configuration, but users can change to Single Quote as required.
  • Escape character: It is used to include special character in the value and it is placed before special character in the value. It is recommended to use Backslash in the csv/tsv file to avoid any data discrepancy in the S3 Dataset.
  • Compression Type: Use .bzip or .gzip as required.
  1. When all the configurations are completed in this section, click PREVIEW. This fetches the first 10 records from the source file and displays them in the Preview Results window.
  2. Click LOAD COLUMN DETAILS. The Columns section appears as shown in the section below.

Columns

When you click LOAD COLUMN DETAILS in the S3 File Configuration section, following columns appear:

  • Column Header: Column headers that are fetched from csv/tsv file in the S3 bucket.
  • Data Type: You can assign a data type to the field that is going to be created in the S3 Dataset. Gainsight supports creating fields of five data types:
    • String
    • Number
    • Boolean
    • Date
    • DateTime

Note: By default, new fields that are going to be created in the S3 dataset are assigned with String data type.

  • Output Field Label: You can assign a field label in the S3 dataset into which values from csv/tsv file are fetched. You can assign the number of decimals for the Number values.

If there are many fields for ingestion, you can search and select the required fields from the Search box. You can select or remove the source file column headers using the checkbox next to the Column Header name. You can do this when the source file in the S3 bucket is updated (column headers are removed).

Columns.gif

Limitations

Following limitation is applied in the Columns section:

  • Following of the data type mappings between CSV column headers and S3 dataset fields fail:

    • String to Number

    • DateTime to Date

    • Date to DateTime

    • String to Boolean: If value in the CSV column is not True, it loads the value as False in the S3 Dataset field.

Date Configuration

There are three configurations Date, DateTime, and Timezone available in this section. Select these formats for the records to be ingested as similar to the Date, DateTime, and Timezone formats in the csv/tsv file records in the S3 bucket. To check formats of the source csv/tsv file records, view Preview dialog in the S3 File Configuration section. Following are the use cases for these configurations:

  • Date Format: Select yyyy-MM-dd if the format of the date values in the source file is 2018-02-18. Following are the date formats supported in the S3 dataset task from a csv/tsv file. 

    Supported formats (click here to expand the list)
    dd-MMM-yy
    dd-MMM-yyyy
    dd/MM/yy
    dd/MM/yyyy
    EEEE, dd MMMM yyyy
    M-d-yyyy
    M-dd-yy
    M/d/yy
    M/d/yyyy
    M/dd/yy
    M/dd/yyyy
    MM-dd-yyyy
    MM/dd/yyyy
    MMMM d, yyyy
    yyyy-M-d
    yyyy-MM-dd
    yyyy/M/d
    yyyyMMdd
     
  • DateTime Format: Select yyyy-MM-dd’T’HH:mm:ss[‘.’SSS][XXXXX] if the format of the datetime values in the source file is 2017-03-22T12:37:23-08:00. Following are the DateTime formats supported in the S3 dataset task from a csv/tsv file.

    Supported formats (click here to expand the list)
    dd/MM/yyyy hh:mm:ss['.'SSS][XXXXX] a
    yyyy/M/dd'T'HH:mm:ss['.'SSS][XXXXX]
    yyyy-M-dd'T'HH:mm:ss['.'SSS][XXXXX]
    yyyy/MM/dd'T'HH:mm:ss['.'SSS][XXXXX]
    yyyy-MM-dd'T'HH:mm:ss['.'SSS][XXXXX]
     
  • Time Zone: By default, UTC time zone selected in this field. If the datetime value (Ex: 2017-03-22T12:37:23-08:00) in the source file has a time zone (Ex: -08:00), select the same time zone (GMT-08:00) Pacific Standard Time (America/Los_Angeles) in this field. If the datetime values in the source file do not have any timezone, select UTC in this field.

Data loading into the S3 dataset fails if the Date and DateTime formats selected in the Date Configuration section is not the same as in the source.

Date Configurations.gif

Once you set up configurations in the above three sections, click SAVE in the Setup Rule header to save the S3 dataset successfully.

Save button.png

Once the S3 dataset is saved, you cannot make any changes in the configurations. You can only remove the column headers after the dataset is saved. You can do this when the source file in the S3 bucket is updated (column headers are removed).

If you have feedback or questions on this feature, please share them on community.gainsight.com.

 

  • Was this article helpful?