11 Pandas Udemy Flashcards

1
Q

Pandas is an open source library built on top of NumPy

I Allows for fas analysis and data cleaning and preparation

It excels in performance and productivity

It also has built-in visualization features

It can work with data from a wide variety of sources

A

Pandas é o Excel do Python

ou

Versão Python dos R DataFrames

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

# Creating a Series

labels = [‘a’,’b’,’c’]
my_list = [10,20,30]
pd.Series(data=my_list,index=labels)
==>

### pd.Series(dic) se dic = {'a':10,'b':20,'c':30}
também é a mesma coisa (já passa data e index)
A

a 10
b 20
c 30
dtype: int64

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data in a Series

Even functions (although unlikely that you will use this)

pd.Series([sum,print,len])
==>

A

0
1
2
dtype: object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

ser1 = pd.Series([1,2,3,4],index = [‘USA’, ‘Germany’,’USSR’, ‘Japan’])
ser2 = pd.Series([1,2,5,4],index = [‘USA’, ‘Germany’,’Italy’, ‘Japan’])
ser1 + ser2
==>

A
Germany    4.0
Italy      NaN
Japan      8.0
USA        2.0
USSR       NaN
dtype: float64

Notar que o nome do país é o index e o número é a data

NaN - Not a Number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
df = pd.DataFrame(randn(5,4),index='A B C D E'.split(),columns='W X Y Z'.split())
df.round(2)
==>
W	X	Y	Z
A	2.71	0.63	0.91	0.50
B	0.65	-0.32	-0.85	0.61
C	-2.02	0.74	0.53	-0.59
D	0.19	-0.76	-0.93	0.96
E	0.19	1.98	2.61	0.68
A
df['W'].round(2)
A    2.71
B    0.65
C   -2.02
D    0.19
E    0.19
Name: W, dtype: float64
# SQL Syntax (NOT RECOMMENDED!)
df.W
# sinônimo da anterior
# confunde com métodos se usar

type(df[‘W’])
==>
pandas.core.series.Series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
df ==>
W	X	Y	Z
A	2.71	0.63	0.91	0.50
B	0.65	-0.32	-0.85	0.61
C	-2.02	0.74	0.53	-0.59
D	0.19	-0.76	-0.93	0.96
E	0.19	1.98	2.61	0.68
A
df['new'] = df['Z'] + 1
==>
W	X	Y	Z	new
A	2.71	0.63	0.91	0.50	1.50
B	0.65	-0.32	-0.85	0.61	1.61
C	-2.02	0.74	0.53	-0.59	0.41
D	0.19	-0.76	-0.93	0.96	1.96
E	0.19	1.98	2.61	0.68	1.68
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
df ==>
W	X	Y	Z	new
A	0.30	1.69	-1.71	-1.16	-0.16
B	-0.13	0.39	0.17	0.18	1.18
C	0.81	0.07	0.64	0.33	1.33
D	-0.50	-0.75	-0.94	0.48	1.48
E	-0.12	1.90	0.24	2.00	3.00
A
df.drop('new',axis=1)
==> 
	W	X	Y	Z
A	0.30	1.69	-1.71	-1.16
B	-0.13	0.39	0.17	0.18
C	0.81	0.07	0.64	0.33
D	-0.50	-0.75	-0.94	0.48
E	-0.12	1.90	0.24	2.00
# Not inplace unless specified!
df
	W	X	Y	Z	new
A	0.30	1.69	-1.71	-1.16	-0.16
B	-0.13	0.39	0.17	0.18	1.18
C	0.81	0.07	0.64	0.33	1.33
D	-0.50	-0.75	-0.94	0.48	1.48
E	-0.12	1.90	0.24	2.00	3.00
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
df ==>
W	X	Y	Z	new
A	0.30	1.69	-1.71	-1.16	-0.16
B	-0.13	0.39	0.17	0.18	1.18
C	0.81	0.07	0.64	0.33	1.33
D	-0.50	-0.75	-0.94	0.48	1.48
E	-0.12	1.90	0.24	2.00	3.00

df.drop(‘new’,axis=1,inplace=True)
==>

df
==>

A
W	X	Y	Z
A	0.30	1.69	-1.71	-1.16
B	-0.13	0.39	0.17	0.18
C	0.81	0.07	0.64	0.33
D	-0.50	-0.75	-0.94	0.48
E	-0.12	1.90	0.24	2.00

se inplace=True a mudança ocorre no arquivo original também

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
df ==>
W	X	Y	Z
A	0.30	1.69	-1.71	-1.16
B	-0.13	0.39	0.17	0.18
C	0.81	0.07	0.64	0.33
D	-0.50	-0.75	-0.94	0.48
E	-0.12	1.90	0.24	2.00

df.drop(‘E’,axis=0)
==>

A
W	X	Y	Z
A	0.30	1.69	-1.71	-1.16
B	-0.13	0.39	0.17	0.18
C	0.81	0.07	0.64	0.33
D	-0.50	-0.75	-0.94	0.48
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
df ==>
W	X	Y	Z
A	0.30	1.69	-1.71	-1.16
B	-0.13	0.39	0.17	0.18
C	0.81	0.07	0.64	0.33
D	-0.50	-0.75	-0.94	0.48
E	-0.12	1.90	0.24	2.00
A

df.loc[‘A’] # por nome

** Selecting Rows**

df.iloc[0] # por index

==>

W    0.30
X    1.69
Y   -1.71
Z   -1.16
Name: A, dtype: float64
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
df ==>
W	X	Y	Z
A	0.30	1.69	-1.71	-1.16
B	-0.13	0.39	0.17	0.18
C	0.81	0.07	0.64	0.33
D	-0.50	-0.75	-0.94	0.48
E	-0.12	1.90	0.24	2.00

df.loc[[‘A’,’B’],[‘W’,’Y’]]
==>

A

W Y
A 0.30 -1.71
B -0.13 0.17

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
df ==>
W	X	Y	Z
A	2.71	0.63	0.91	0.50
B	0.65	-0.32	-0.85	0.61
C	-2.02	0.74	0.53	-0.59
D	0.19	-0.76	-0.93	0.96
E	0.19	1.98	2.61	0.68

df>0 ==>

W	X	Y	Z
A	True	True	True	True
B	True	False	False	True
C	False	True	True	False
D	True	False	False	True
E	True	True	True	True
A
df[df>0]
==>
W	X	Y	Z
A	2.71	0.63	0.91	0.50
B	0.65	NaN	NaN	0.61
C	NaN	0.74	0.53	NaN
D	0.19	NaN	NaN	0.96
E	0.19	1.98	2.61	0.68
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
df ==>
W	X	Y	Z
A	2.71	0.63	0.91	0.50
B	0.65	-0.32	-0.85	0.61
C	-2.02	0.74	0.53	-0.59
D	0.19	-0.76	-0.93	0.96
E	0.19	1.98	2.61	0.68
df[df['W']>0][['Y','X']]
==>
	Y	X
A	0.91	0.63
B	-0.85	-0.32
D	-0.93	-0.76
E	2.61	1.98
A

Não existe “and” que é substituído por “&”

df[(df[‘W’]>0) & (df[‘Y’] > 1)]
==>
W X Y Z new
E 0.19 1.98 2.61 0.68 1.68

Não existe “or” que é substituído por “|”

df[(df['W']>0) | (df['Y'] > 1)]
W	X	Y	Z	new
A	2.71	0.63	0.91	0.50	1.50
B	0.65	-0.32	-0.85	0.61	1.61
D	0.19	-0.76	-0.93	0.96	1.96
E	0.19	1.98	2.61	0.68	1.68
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
df ==>
W	X	Y	Z
A	2.71	0.63	0.91	0.50
B	0.65	-0.32	-0.85	0.61
C	-2.02	0.74	0.53	-0.59
D	0.19	-0.76	-0.93	0.96
E	0.19	1.98	2.61	0.68

df.reset_index()
==>

A
index	W	X	Y	Z
0	A	2.71	0.63	0.91	0.50
1	B	0.65	-0.32	-0.85	0.61
2	C	-2.02	0.74	0.53	-0.59
3	D	0.19	-0.76	-0.93	0.96
4	E	0.19	1.98	2.61	0.68

(inplace=True) se quiser que mude no original

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

newind = ‘CA NY WY OR CO’.split()
newind
==>
[‘CA’, ‘NY’, ‘WY’, ‘OR’, ‘CO’]

df[‘States’] = newind
df
==>

inplace=True se quiser fazer valer na página ao lado

A
W	X	Y	Z	States
A	2.71	0.63	0.91	0.50	CA
B	0.65	-0.32	-0.85	0.61	NY
C	-2.02	0.74	0.53	-0.59	WY
D	0.19	-0.76	-0.93	0.96	OR
E	0.19	1.98	2.61	0.68	CO
df.set_index('States')
==>
W	X	Y	Z
States				
CA	2.71	0.63	0.91	0.50
NY	0.65	-0.32	-0.85	0.61
WY	-2.02	0.74	0.53	-0.59
OR	0.19	-0.76	-0.93	0.96
CO	0.19	1.98	2.61	0.68

a linha States fica vazia

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Multi-Index and Index Hierarchy
Let us go over how to work with Multi-Index, first we’ll create a quick example of what a Multi-Indexed DataFrame would look like:

Index Levels
outside = [‘G1’,’G1’,’G1’,’G2’,’G2’,’G2’]
inside = [1,2,3,1,2,3]
hier_index = list(zip(outside,inside))
hier_index = pd.MultiIndex.from_tuples(hier_index)

hier_index ==>
MultiIndex([('G1', 1),
            ('G1', 2),
            ('G1', 3),
            ('G2', 1),
            ('G2', 2),
            ('G2', 3)],
           )
df = pd.DataFrame(np.random.randn(6,2),
index=hier_index,columns=['A','B'])
df = df.round(2)
df
==>
		A	        B
G1	1	-0.50	-0.75
        2	-0.94	0.48
        3	-0.12	1.90
G2	1	0.24	2.00
        2	-0.99	0.20
        3	-1.14  	0.00
A

Now let’s show how to index this! For index hierarchy we use df.loc[], if this was on the columns axis, you would just use normal bracket notation df[]. Calling one level of the index returns the sub-dataframe:

df.loc['G1'].loc[1]
==>
A    0.153661
B    0.167638
Name: 1, dtype: float64

df.index.names ==>
FrozenList([None, None])

df.index.names = ['Group','Num']
df
==>
		            A	         B
Group	Num		
G1	           1	-0.50	-0.75
                   2	-0.94	0.48
                   3	-0.12	1.90
G2          	   1	0.24	2.00
                    2 -0.99	0.20
                    3	-1.14	        0.00

df.loc[‘G2’].loc[2].loc[‘B’]
==>
0.2

# xs = Cross Section
df.xs(1,level='Num')
==>
	     A	B
Group		
G1	       -0.50	-0.75
G2	        0.24	2.00
17
Q
df = pd.DataFrame({'A':[1,2,np.nan],
                  'B':[5,np.nan,np.nan],
                  'C':[1,2,3]})
df ==>
	A	  B	        C
0	1.0	  5.0	        1
1	2.0	  NaN	2
2	NaN  NaN	3
A

deleta todas as linhas com NaN
df.dropna() ==>
A B C
0 1.0 5.0 1

# deleta todas as colunas com NaN
df.dropna(axis=1)
=>
	C
0	1
1	2
2	3
18
Q
df ==>
	A	  B	        C
0	1.0	  5.0	        1
1	2.0	  NaN	2
2	NaN  NaN	3

df.dropna(thresh=2)
==>

A

A B C
0 1.0 5.0 1
1 2.0 NaN 2

se thresh=2, só deleta a partir de 2 NaN (null/none)

19
Q
df ==>
	A	  B	        C
0	1.0	  5.0	        1
1	2.0	  NaN	2
2	NaN  NaN	3

df.fillna(value=’FILL VALUE’)
==>

A

A B C
0 1 5 1
1 2 FILL VALUE 2
2 FILL VALU FILL VALUE 3

substitui os NaN pelo escrito em value=

20
Q
df ==>
	A	  B	        C
0	1.0	  5.0	        1
1	2.0	  NaN	2
2	NaN  NaN	3

df[‘A’].fillna(value=df[‘A’].mean())
==>

A

0 1.0
1 2.0
2 1.5
Name: A, dtype: float64

substituiu pela média do resto da coluna A

21
Q

Create dataframe
data = {‘Company’:[‘GGL’,’GGL’,’IBM’,’IBM’,’FB’,’FB’],
‘Person’:[‘Sam’,’Cass’,’Amy’,’Van’,’Carl’,’Sara’],
‘Sales’:[200,120,340,124,243,350]}
df = pd.DataFrame(data)
df ==>

Company	Person	Sales
0	GGL	Sam	        200
1	GGL	Cass	120
2	IBM	        Amy	340
3	IBM	        Van	         124
4	FB	        Carl	        243
5	FB	        Sara	350

** Now you can use the .groupby() method to group rows together based off of a column name. For instance let’s group based off of Company. This will create a DataFrameGroupBy object:**
df.groupby(‘Company’)
==> ERROR

A
# You can save this object as a new variable:
by_comp = df.groupby("Company")
==>
# And then call aggregate methods off the object:
by_comp.mean() ==>
	             Sales
Company	
FB	              296.5
GGL	      160.0
IBM	              232.0

linha única
df.groupby(‘Company’).mean()
==>
(junção das duas anteriores)

sum(), std(), min(), max(), count(), describe()

by_comp.count().loc[‘FB’] ==>
Person 2
Sales 2
Name: FB, dtype: int64

# by_comp.describe().transpose()['GGL']
Sales  count      2.000000
       mean     160.000000
       std       56.568542
       min      120.000000
       25%      140.000000
       50%      160.000000
       75%      180.000000
       max      200.000000
Name: GGL, dtype: float64
# mostra na vertical e não na horizontal os dados
22
Q
df1 ==>
        A	B	C	D
0	A0	B0	C0	D0
1	A1	B1	C1	D1
2	A2	B2	C2	D2
3	A3	B3	C3	D3
df2 ==>
         A	B	C	D
4	A4	B4	C4	D4
5	A5	B5	C5	D5
6	A6	B6	C6	D6
7	A7	B7	C7	D7

pd.concat([df1,df2]) ==>

A
A	B	C	D
0	A0	B0	C0	D0
1	A1	B1	C1	D1
2	A2	B2	C2	D2
3	A3	B3	C3	D3
4	A4	B4	C4	D4
5	A5	B5	C5	D5
6	A6	B6	C6	D6
7	A7	B7	C7	D7
# default is axis=0 (rows)
# padrão é eixo 0, linhas
23
Q
df1 ==>
        A	B	C	D
0	A0	B0	C0	D0
1	A1	B1	C1	D1
2	A2	B2	C2	D2
3	A3	B3	C3	D3
df2 ==>
         A	B	C	D
4	A4	B4	C4	D4
5	A5	B5	C5	D5
6	A6	B6	C6	D6
7	A7	B7	C7	D7

pd.concat([df1,df2], axis=1) ==>

A
A	B	C	D	A	B	C	D
0	A0	B0	C0	D0	NaN	NaN	NaN	NaN
1	A1	B1	C1	D1	NaN	NaN	NaN	NaN
2	A2	B2	C2	D2	NaN	NaN	NaN	NaN
3	A3	B3	C3	D3	NaN	NaN	NaN	NaN
4	NaN	NaN	NaN	NaN	A4	B4	C4	D4
5	NaN	NaN	NaN	NaN	A5	B5	C5	D5
6	NaN	NaN	NaN	NaN	A6	B6	C6	D6
7	NaN	NaN	NaN	NaN	A7	B7	C7	D7

axis 1 é o das colunas

24
Q
left ==>
        key	A	B
0	K0	A0	B0
1	K1	A1	B1
2	K2	A2	B2
3	K3	A3	B3
right ==>
	key	C	D
0	K0	C0	D0
1	K1	C1	D1
2	K2	C2	D2
3	K3	C3	D3

pd.merge(left,right,how=’inner’,on=’key’)
==>

A
A	B	key	C	D
0	A0	B0	K0	C0	D0
1	A1	B1	K1	C1	D1
2	A2	B2	K2	C2	D2
3	A3	B3	K3	C3	D3
# Merging
The merge function allows you to merge DataFrames together using a similar logic as merging SQL Tables together. For example:

Merging se baseia na coluna e não no Index

25
Q
left ==>
	key1	   key2	A	B
0	K0	  K0	       A0	B0
1	K0	  K1	       A1	B1
2	K1	  K0	       A2	B2
3	K2	  K1	       A3	B3
right ==>
	key1	 key2	C	D
0	K0	 K0	        C0	D0
1	K1	 K0	        C1	D1
2	K1	 K0	        C2	D2
3	K2	 K0	        C3	D3

pd.merge(left, right, on=[‘key1’, ‘key2’])
==>

A

key1 key2A B C D
0 K0 K0 A0 B0 C0 D0
1 K1 K0 A2 B2 C1 D1
2 K1 K0 A2 B2 C2 D2

notar que juntou L0 com R0, L2 com R1, L2 com R2 (linha 2 de Right)

26
Q
left ==>
	key1	   key2	A	B
0	K0	  K0	       A0	B0
1	K0	  K1	       A1	B1
2	K1	  K0	       A2	B2
3	K2	  K1	       A3	B3
right ==>
	key1	 key2	C	D
0	K0	 K0	        C0	D0
1	K1	 K0	        C1	D1
2	K1	 K0	        C2	D2
3	K2	 K0	        C3	D3

pd.merge(left, right, how=’outer’, on=[‘key1’, ‘key2’]) ==>

A
key1	key2    A	       B	C	   D
0	K0	K0	   A0	B0	C0	     D0
1	K0	K1	    A1	B1	NaN	NaN
2	K1	K0	   A2	B2	C1	      D1
3	K1	K0	   A2	B2	C2	     D2
4	K2	K1	   A3	B3	NaN	NaN
5	K2	K0	   NaN	NaN  C3   	D3
27
Q
left ==>
	key1	   key2	A	B
0	K0	  K0	       A0	B0
1	K0	  K1	       A1	B1
2	K1	  K0	       A2	B2
3	K2	  K1	       A3	B3
right ==>
	key1	 key2	C	D
0	K0	 K0	        C0	D0
1	K1	 K0	        C1	D1
2	K1	 K0	        C2	D2
3	K2	 K0	        C3	D3

pd.merge(left, right, how=’left’, on=[‘key1’, ‘key2’]) ==>

A
A	B	key1	key2   C   	D
0	A0	B0	K0	K0	   C0	D0
1	A1	B1	K0	K1	   NaN	NaN
2	A2	B2	K1	K0	   C1	        D1
3	A2	B2	K1	K0	   C2        D2
4	A3	B3	K2	K1	   NaN	NaN

Tem que ter na da esquerda, se how=’right’seria o contrário

28
Q
left ==>
        A	B
K0	A0	B0
K1	A1	B1
K2	A2	B2
right ==>
        C	D
K0	C0	D0
K2	C2	D2
K3	C3	D3
A
left.join(right) ==>
         A	B	C	  D
K0	A0	B0	C0	  D0
K1	A1	B1	NaN  NaN
K2	A2	B2	C2	  D2
# Joining
Joining is a convenient method for combining the columns of two potentially differently-indexed DataFrames into a single result DataFrame.
# Diferença de Joining pra Merge é que Joining usa Index e Merge usa coluna
29
Q
left ==>
        A	B
K0	A0	B0
K1	A1	B1
K2	A2	B2
right ==>
        C	D
K0	C0	D0
K2	C2	D2
K3	C3	D3
A
left.join(right, how='outer') ==>
        A	B	C	  D
K0	A0	B0	C0	  D0
K1	A1	B1	NaN  NaN
K2	A2	B2	C2	  D2
K3	NaN NaN C3	  D3
# Joining
Joining is a convenient method for combining the columns of two potentially differently-indexed DataFrames into a single result DataFrame.
# Diferença de Joining pra Merge é que Joining usa Index e Merge usa coluna
30
Q
df = pd.DataFrame({'col1':[1,2,3,4],'col2':[444,555,666,444],'col3':['abc','def','ghi','xyz']})
df.head()
==>
      col1	col2	   col3
0	1	444	   abc
1	2	555	   def
2	3	666	   ghi
3	4	444	   xyz
A

df[‘col2’].unique()
==>
array([444, 555, 666], dtype=int64)

df['col2'].nunique()
==
len(df['col2'].unique())
==>
3
df['col2'].value_counts()
==>
df['col2'].value_counts()
444    2
555    1
666    1
Name: col2, dtype: int64
31
Q
df ==>
==>
      col1	col2	   col3
0	1	444	   abc
1	2	555	   def
2	3	666	   ghi
3	4	444	   xyz

newdf = df[(df[‘col1’]>2) & (df[‘col2’]==444)]
newdf
==>

A

col1 col2 col3

3 4 444 xyz

32
Q
df ==>
      col1	col2	   col3
0	1	444	   abc
1	2	555	   def
2	3	666	   ghi
3	4	444	   xyz
def times2(x):
    return x*2 ==>
A
df['col1'].apply(times2) ==>
0    3
1    3
2    3
3    3
Name: col3, dtype: int64

df[‘col1’].sum() ==>
10

df['col2'].apply(lambda x: x*2) ==>
0      888
1       1110
2      1332
3      888
Name: col2, dtype: int64
33
Q

** Permanently Removing a Column**

A

del df[‘col1’]

34
Q
df ==>
       col1	  col2  	col3
0	1	  444	 abc
1	2	  555	 def
2	3	  666	 ghi
3	4	  444	 xyz
df.sort_values(by='col2') 
==
df.sort_values('col2') 
#inplace=False by default
==>
A
col1	col2   col3
0	1	444	  abc
3	4	444	  xyz
1	2	555	  def
2	3	666	  ghi
35
Q

PIVOT TABLES

data = {‘A’:[‘foo’,’foo’,’foo’,’bar’,’bar’,’bar’],
‘B’:[‘one’,’one’,’two’,’two’,’one’,’one’],
‘C’:[‘x’,’y’,’x’,’y’,’x’,’y’],
‘D’:[1,3,2,5,4,1]}

df = pd.DataFrame(data)
df ==>
         A	B	C	D
0	foo	one	x	1
1	foo	one	y	3
2	foo	two	x	2
3	bar	two	y	5
4	bar	one	x	4
5	bar	one	y	1
A
C	x	y
A	B		
bar	one	4.0	1.0
        two	NaN 5.0
foo	one	1.0	3.0
        two	2.0	NaN
36
Q

PIVOT TABLES

data = {‘A’:[‘foo’,’foo’,’foo’,’bar’,’bar’,’bar’],
‘B’:[‘one’,’one’,’two’,’two’,’one’,’one’],
‘C’:[‘x’,’y’,’x’,’y’,’x’,’y’],
‘D’:[1,3,2,5,4,1]}

df = pd.DataFrame(data)
df ==>
         A	B	C	D
0	foo	one	x	1
1	foo	one	y	3
2	foo	two	x	2
3	bar	two	y	5
4	bar	one	x	4
5	bar	one	y	1

df.pivot_table(values=’D’,index=[‘A’, ‘B’],columns=[‘C’]) ==>

A
C	x	y
A	B		
bar	one	4.0	1.0
        two	NaN 5.0
foo	one	1.0	3.0
        two	2.0	NaN
37
Q
Panda can work with a variety of files:
CSV
Excel
HTML
SQL
A

df = pd.read_csv(‘example’)
df
==>
Pandas pode ler de vários outros tipos de arquivos

df.to_csv(‘example’,index=False)
????

38
Q

Excel
Pandas can read and write excel files, keep in mind, this only imports data. Not formulas or images, having images or macros may cause this read_excel method to crash.

A

Não importa imagens, gráficos ou fórmulas

Pandas só importa dados do Excel