V-Lab @ ANDC

Creation of DataFrame

Aim

To create a DataFrame

Theory

A Data frame represents a two-dimensional structure that looks like a spreadsheet.


It contains an ordered collection of columns and rows that can be of different value types ( numeric, string, Boolean, etc.) and can be created from the lists, dictionary, and a list of a dictionary, etc.

A Pandas DataFrame can be created in the real world by loading the datasets from existing storage. It can be SQL table, CSV file or an Excel file, which may contain time series data, stock exchange data, employees personnel details etc.. Although a DataFrame is physically two- dimensional, hierarchical indexing allows higher-dimensional data to be represented in a tabular style.

Data frame in Python may be created from the existing data structures used in a program and may contain values of different data types like numeric, float, character, string and many more. We detail below four different methods that are commonly used for constructing a data frame.

Method #1:

Creating a data frame using List data structure DataFrame may be created using a single list or a list of lists, where a column is created for each list.

Example:

In above code, a data frame is created with default column name '0' as no column name is explicitly provided while constructing the data frame df. Also, rows are labelled with default index value starting from 0.

Method #2:

Creating DataFrame from a dictionary where each value component must be of the same length in the dictionary. In the following example each value component of dictionary is a list type.

Example:

In above code, a data frame is created with two columns 'Name' and 'Class' and rows are labelled with default index value starting from 0.

Method #3:

Creating a DataFrame using N-dimensional array such that each nested arrays are of same length. If the index is passed, then the length index should be equal to the length of arrays. If no index is passed, then by default, the index will be range(n), where n is the array length.

Example:

In above example, narray is a 2D array where row 0 corresponds to name of student, row 1 corresponds to marks obtained and row 3 corresponds to rank obtained by the student. Note that we have given names to columns in the data frame explicitly as there is no name associated with each array in narry. Hence, data frame will have four columns corresponding to each student as shown below.

Method #4:

Creating a Dataframe from a list of dictionaries Passing lists of dictionaries can create pandas DataFrame as input data. By default, dictionary keys are taken as columns.

Example:

Method #5:

Creating DataFrame from lists using zip() function.

In this method,the user can create a dataframe using the python inbuilt zip() function.This function creates a dataframe by merging two lists.

Example:

Method #6:

Creating DataFrame from a dictionary of pandas series.

In method#2, a DataFrame was created using a dictionary. Now we will first create a dictionary using data series, followed by its usage for creating a data frame. The resultant data frame will have three columns corresponding to each key and number of rows will be same as the size of each list in value part.

Example:

Procedure

    Steps to follow are as follows:

  1. Open the website by clicking Simulation.
  2. From the drop-down options, choose a method for creating a DataFrame.
  3. Note that you may also make changes in the code to try your own code.
  4. DO NOT TYPE ANYTHING IN THE INSTRUCTIONS BOX BELOW (not required for this lab experiment).
  5. To see the output in the output window, press the run button.

Practice

Instructions:

Just click the next button to see which element goes to which position.

On runing: pandas.DataFrame([[1,2,3],[4,5,6],[7,8,9]], [0,1,2],['One','Two','Three'] )

[

[

1

,

2

,

3

]

,

[

4

,

5

,

6

]

,

[

7

,

8

,

9

]

]

Name of Columns
Row Index 'One' 'Two' 'Three'
0 1 2 3
1 4 5 6
2 7 8 9

Observation

The start of row 0

The element 1 of sub-list 1 is assigned to [ 0 , 0 ] in the DataFrame

The element 2 of sub-list 1 is assigned to [ 0 , 1 ] in the DataFrame

The element 3 of sub-list 2 is assigned to [ 0 , 2 ] in the DataFrame

The end of row 0

The start of row 1

The element 4 of sub-list 2 is assigned to [ 1 , 0 ] in the DataFrame

The element 5 of sub-list 2 is assigned to [ 1 , 1 ] in the DataFrame

The element 6 of sub-list 3 is assigned to [ 1 , 2 ] in the DataFrame

The end of row 1

The start of row 2

The element 7 of sub-list 3 is assigned to [ 2 , 0 ] in the DataFrame

The element 8 of sub-list 3 is assigned to [ 2 , 1 ] in the DataFrame

The element 9 of sub-list 4 is assigned to [ 2 , 2 ] in the DataFrame

The end of row 2

The end of row 3

Quiz


Q1. Which of the following is used to give user defined column index in DataFrame.


  • index
  • columns
  • column

Q2. In DataFrame, by default new column added as the _____________ column

  • First (Left Side)
  • Second
  • Last (Right Side)

Q3. We can create DataFrame from _____

  • Numpy arrays
  • List of Dictionaries
  • All of the above

Q4. Which library is to be imported for creating DataFrame?

  • Python
  • Pandas
  • Random

Q5. We can add a new row to a DataFrame using the _____________ method

  1. rloc[ ]
  2. iloc[ ]
  3. loc[ ]

Result

Hence we can create a DataFrame by any of these methods. Data represented in table is more preferred over linear arrays.

Team

Mr. Sushant Sahrma, B.Sc Physical Sciences with Computer Science, III year,
Ms. Alia, B.Sc Physical Sciences with Computer Science, II year.

Mentor:
Prof. Sharanjit Kaur,
Ms. Gunjan Rani

Text Copied